Re: [visionlist] Plagiarism checks in Empirical Manuscripts

Elsevier and other publishers’ efforts to detect “self-plagiarism” is an instance of text mining the world’s scientific literature. Tom points out this also is useful for detecting data duplication and fraud. Unfortunately it cannot easily be used for that – Elsevier shuts down independent researchers who use their journal subscriptions to investigate fraud ( ; . Text mining the scientific literature could yield thousands of discoveries, about both fraud and new connections between molecules, genes, and diseases, but it can’t be done because major journal publishers own the content and are trying to monetize it all for themselves ( “Self-plagiarism” also puts publishers at legal risk as a result of them publishing all our articles under restrictive copyright – it can be a copyright violation for them to publish text that happens to be identical to an earlier paper by the same author that happens to have been published by a different publisher. In an email from the journal to Peter Tse, the issue was framed as protecting the author but there was also this sentence: “Another issue to be borne in mind is the matter of copyright in extensive text duplication.”

Thus the traditional system of publishers owning the copyright to our work is both preventing new discoveries (which has to wait until the publishers find a way to use text mining to maintain or increase their profits) and creating ridiculous busywork for ourselves. Yesterday I attended a university press publishing conference where Kevin Stranack of demo’ed Open Journal Systems version 3, which has already been released and looks significantly easier to use than ScholarOne/Manuscript Central, the system that expensive subscription journals use. The existence of OJS3 allows the creation of journals at very low cost (it already underpins thousands of journals, such as Glossa, which flipped from Elsevier) Unfortunately I seem to be the only researcher at the conference, but I’m tweeting about it and will add some related information to


On 7/12/17, 03:03, “visionlist on behalf of Ghuman, Avniel” wrote:

Dear all and Tom,

I would argue that there is a very big difference between borrowing previous text that you’ve written and data duplication. There is little question regarding the ethics of data duplication (e.g. it is not ethical unless you are talking about new analyses of previous data and it is clearly flagged as previously published data) and “self-plagairism” the ethics of which I would argue are still being debated at least with regards to reusing brief passages, particularly in the introduction or methods sections (indeed, this kind of self-plagairism is not considered research misconduct by US agencies, e.g. ). Thus, we should be careful not to conflate the two.

One question I have often had is why is there such a strong dislike of simply using quotation marks, particularly if you are just quoting yourself because then you are not even borrowing ideas from others. Or, for example in a methods section one could just add a sentence that said something like “Below we restate the methods from [paper X] with the specific passages modified to reflect this study.” It seems to me that the major concern with this kind of self-plagairism is making sure the reader understands that the sections aren’t entirely new, so why not just have an explicit callout to that fact?

Best wishes,

Avniel Ghuman, Ph.D.
Laboratory of Cognitive Neurodynamics
Director of MEG research
Assistant Professor of Neurological Surgery, Neurobiology, Psychiatry, and Psychology
Faculty in the Center for the Neural Basis of Cognition and the Center for Neuroscience
University of Pittsburgh

From: Tom Wallis <>
Date: Tuesday, July 11, 2017 5:53 AM
To: Jim Ferwerda <>, “” <>
Cc: “” <>
Subject: Re: [visionlist] Plagiarism checks in Empirical Manuscripts

Hi all,

I’m sympathetic to Peter’s desire to avoid “busywork” in re-writing parts of introductions, and of course it’s pointless to re-write standard methods (as in Malte’s original comment). However, I don’t think the guidelines against self-plagarism are so easily dismissed (“I can’t steal from myself”).

In my mind they exist to reduce the risk of CV-padding by re-use or “salami-slicing” research work into multiple outputs. Further to Jim’s comment, this CV-padding via (self-)plagiarism is by no means limited to students at unknown universities trying to get ahead – see for example the recent highly-publicised implosion of Brian Wansink’s Cornell Food and Brand Lab (detailed here: Wansink’s work includes numerous cases of blatant self-plagiarism, both in text (some articles containing up to 50% of the same text) and in data duplication. Not only does this practice disadvantage those scientists who don’t engage in this practice (“candidate A has many more papers than candidate B!”), it also could create a false impression of the empirical support for some theory or guideline (“over 50 studies show that X does Y!”). Without guidelines against self-plagarism, there would be no way to explicitly police these practices.

While I think it’s important that these guidelines exist, I agree with others that they (and automated plagiarism detection software) should be applied with sufficient editorial common sense. Materials and Methods, and a paragraph in the introduction (with appropriate citation) seem fine, when the bulk of the paper presents new results and ideas.




Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s