Physics Today
Jump to Content
Increase text size Decrease text size
  • Sign In
  • View Items in Cart View Cart
  • Advanced
  • Keyword
 
  • Home
  • Print Edition
  • Daily Edition
    • News Picks
    • The Dayside
    • Physics Update
    • Singularities
    • Points of View
    • Politics and Policy
    • Science and the Media
    • Obituaries
    • We Hear That
    • Events Calendar
  • Advertising
  • Buyer's Guide
  • About us
    • Our mission
    • Our people
    • American Institute of Physics
    • Member societies
    • Register
    • Subscribe
    • Submit content
    • Marketing reprints
    • Rights and permissions
    • Help/FAQ
    • Change mailing address
    • Contact us
  • Jobs
    • Job Seeker Login
    • Search Jobs
    • Post Resumes
    • Career Resources
    • For Employers
    • Success Stories
    • Resume Templates
    • About Us
    • Advertising
    • Display Advertising
    • Employer Resources
    • Banner Advertising
    • Security Tips
Follow us: Facebook    Twitter    rss    E-mail alert
  • Table of contents
  • Past issues

yellow star Featured Jobs

  • Search jobs
  • Post jobs
Issues and Events

Experimenting with plagiarism detection on the arXiv

March 2007, page 30

arXiv warning
Starting this summer, submissions to the arXiv, the online server where many physicists check daily for new preprints, will be compared with the server's existing 400 000—and counting—manuscripts to check for plagiarism.

When plagiarism is suspected, the submission will be flagged, and the authors will get a message saying "your article has x% overlap with article 'a.' Do you really want to do this?" says Cornell University physicist Paul Ginsparg, the creator and overseer of the arXiv. The authors whose papers were copied from will not be notified.

"This will be a fun experiment," Ginsparg says. "Will we train people to be more clever and to make more word changes? Or will there be a real change in their behavior?"

Behavior did change when University of Virginia physicist Louis Bloomfield began using software to see if his students were cheating. Checking new arXiv submissions is a good idea, Bloomfield says. "People should know it's not okay to steal. It's not even okay to publish your own stuff over and over." After he reported students who had copied, they were prosecuted. Forty-five students either left the university or were found guilty, and three degrees were revoked. "I was immersed in seemingly endless honor trials. Two years of my life were burned up. There's a lot of trouble when you open this can of worms. Plagiarism shouldn't be tolerated, but you need a professional organization to handle the heat."

The arXiv's automated scanning for overlapping text is a refinement of an algorithm used last year by Cornell computer science graduate student Daria Sorokina to look at the server's then nearly 300 000 documents. The algorithm assigns unique numbers to word sequences and then compares those numbers across documents. Common phrases such as "this work was supported in part by" are excluded. "There is nothing new about document fingerprinting," says Cornell computer scientist Johannes Gehrke, an adviser on the project. "The novelty here was the application to the arXiv."

In the study, about 10% of arXiv manuscripts had text blocks that overlapped with other documents. After removing instances of authors reusing parts of their own text, different collaborators on a single project using the same text in separate conference abstracts, and other apparent false positives, less than 1% of manuscripts were still suspect, says Sorokina.

Close examination of 20 pairs of documents with among the highest levels of overlap exposed 16 as plagiarism. "In one case, an author copied descriptions of five or six methods that he was comparing," says Sorokina. "He didn't cite the sources. But the work of comparing was his own." One of the most common types of plagiarism found was the lifting of introductory or background material, especially in PhD theses, says Ginsparg. "The surprising thing is that people submit to the same database where they found [what they copied]. It's mind boggling, given the existence of Google, given the existence of searching on full text, that people wouldn't have an intuition that they would be caught."

"Some of it is different ethical norms," Ginsparg adds. "People in different countries, with different intellectual backgrounds, will sometimes argue that what they are doing is completely correct." The reassuring thing, he adds, "is that the most creative people, who are generating the ideas, don't have to start from someone else's article as a template. We'd be very surprised if authors of prominence showed up as perpetrators as opposed to victims."

Document fingerprinting catches only word-for-word plagiarism. But work is under way in the data-mining community on author identification and detection of the flow of ideas, says Gehrke. "Detecting content-based similarities with more sophisticated methods on a macroscale will be the next step."

In addition to implementing a check on new submissions to the arXiv, Ginsparg is talking to the editors of Physical Review Letters about applying the method to it and other American Physical Society publications. "More work needs to be done to include papers outside of the arXiv, and to go across journals," says Marty Blume, the recently retired APS editor-in-chief. "We have 30 000 submissions a year. We'll have to see how much [of the editors'] time it takes to run. And if we do it, what do we do with the results?"

Toni Feder

 

  • Article Tools
  • Enlarge text   Enlarge text
  • Shrink text   Shrink text
  • Comment on this articleWrite a letter to the editor
  • Free this month
  • Solar energy conversion
  • Experimenting with plagiarism detection on the arXiv
  • New Books
  • New Products
  • Letters
  • Most popular articles
  • Gedanken experiment: Levitate a physics sitcom?
    Points of View
  • Nanoplasmonics: The physics behind the applications
    February 2011
  • Half-quantum vortices
    Physics Update
  • Quantum criticality
    February 2011

 



SERVICES
Physics Today Jobs
Physics Today Buyers Guide
Event Calendar
Obituaries
DAILY EDITION
The Dayside
News Picks
Science in the Media
Politics & Policy
Singularities
Physics Update
Points of View
THE MAGAZINE
This month in print
Institutional subscriptions
Information for advertisers
READER SERVICE
Register
Sign in
Subscribe
Email alert
MORE INFO
FAQ
Contact us
About Physics Today
Privacy Policy
Marketing reprints
Rights and Permissions

Copyright © by the American Institute of Physics - All rights reserved

Find articles by AUTHORNAME

This PublicationThis Publication
ScitationScitation
SPINSPIN
ScitopiaScitopia
Google ScholarGoogle Scholar
PubMedPubMed