Scan of 600 open datasets finds widespread copy‑paste errors, including in high‑profile Parkinson’s paper

What was found
It has been reported that an automated tool developed by a researcher to hunt for copy‑pasted blocks in spreadsheets flagged problems in Excel files tied to 600 published papers. The early sweep reportedly turned up 18 cases deemed serious enough to warrant follow‑up. The findings came out of a volunteer effort to comb open datasets hosted on repositories such as Dryad; many of the problems were simple duplicated sequences of numbers that should have been unique to individual subjects.
A landmark paper under the microscope
One of the most striking examples allegedly involves a 2016 Cell paper on Parkinson’s disease that made the provocative claim the disorder can originate in the gut. The Dryad dataset for the study contains repeated sequences in columns labeled “Adhesive removal times” and “Pole‑descent time,” shared across groups that were supposed to be independent. The duplicated rows account for roughly half of some experimental groups — nontrivial, given the study’s small sample sizes. It has been reported that the issue was flagged to the authors in January and that they have not yet replied.
Why it matters
Are these spreadsheet slipups — or something worse? Investigators say it could be a fat‑finger editing error or deliberate tampering; the public record alone can't distinguish intent. Either way, the impact is real: when high‑visibility papers with thousands of citations rest on small datasets, even a few duplicated entries can skew conclusions and mislead downstream work. The larger lesson is blunt and a little embarrassing for the field — raw data living in plain Excel files needs better automated vetting. Think of it as hygiene for science: nobody likes a surprise, especially eight years after publication.
Bigger picture
This episode feeds into broader worries about reproducibility and the limits of post‑publication peer review. Automated scanners won’t fix everything, but they can catch the low‑hanging fruit — the copy‑paste bloopers and clumsy edits that evade human eyeballs. If journals, funders, and repositories want confidence in published results, they’ll need to make those checks routine. Otherwise we’ll keep finding plot twists long after the story went viral.
Sources: sciencedetective.org, Hacker News
Comments