Towards text copyright detection using metadata in web applications

Marios Poulos, Nikolaos Korfiatis, George Bokos

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

Purpose – This paper aims to present the semantic content identifier (SCI), a permanent identifier, computed through a linear-time onion-peeling algorithm that enables the extraction of semantic features from a text, and the integration of this information within the permanent identifier. Design/methodology/approach – The authors employ SCI to propose a mechanism for simultaneously checking the authenticity and degrees of similarity between different information objects, and present an empirical investigation of the method. A management scenario for the control of the authentication process and the detection of the degree of violation of documents is proposed. Findings – Such a mechanism could be adopted as a component of libraries’ strategy for the protection of the copyrights for documents published on the web. Practical implications – The use of the proposed numeric code can be utilised efficiently as a constituent part of the digital object identifier (DOI) system, making its computation more efficient and meaningful. Originality/value – The identifier proposed in the paper can result in a more efficient index for identifying and retrieving objects in a digital library, as well as online repositories and commercial applications that can handle information retrieval requests more effectively.
Original languageEnglish
Pages (from-to)439-451
Number of pages13
JournalProgram: Electronic Library and Information Systems
Volume45
Issue number4
DOIs
Publication statusPublished - 2011

Keywords

  • Text identification, Information retrieval, Semantics, Persistent identifiers, Data handling, Copyright, Research work

Cite this