Home>> Entertainment>>How a Sharp-Eyed Scientist Became Biology’s Image Detective
Entertainment

How a Sharp-Eyed Scientist Became Biology’s Image Detective

Frustrated by these long timetables, Bik has transitioned to sharing more of her findings online, where journal readers can encounter them. On PubPeer, where she is the most prolific poster who uses her real name, her comments are circumspect—she writes that images are “remarkably similar” or “more similar than expected.” On Twitter, she is more performative, and often plays to a live audience. “#ImageForensics Middle of the Night edition. Level: easy to advanced,” Bik tweeted, at 2:41 A.M. one night. She posted an array of colorful photographs that resembled abstract paintings, including a striated vista of pink and white brushstrokes (a slice of heart tissue) and a fine-grained splattering of ruby-red and white flecks (a slice of kidney). Six minutes later, a biologist in the U.K. responded: two kidney photos appeared identical, she wrote. A minute later, another user flagged the same pair, along with three lung images that looked like the same tissue sample, shifted slightly. Answers continued trickling in from others; they drew Bik-style color-coded boxes around the cloned image parts. At 3:06 A.M., Bik awarded the second user an emoji trophy for the best reply.

In Silicon Valley, Bik and her husband live in an elegant mid-century-modern ranch house with a cheerful, orange front door and a low-angled pitched roof. In the neighborhood, the residence is one of many duplicate copies sporting varying color schemes. I visited Bik just before the pandemic began. Tall, with stylish blue tortoiseshell eyeglasses and shoulder-length chestnut hair, she wore a blouse with a recurring sky-blue-and-orange floral pattern and had a penetrating, blue-eyed gaze. While Bik made tea, her husband, clad in a red fleece jacket, toasted some frozen stroopwafel cookies, from Gouda.

Playing tour guide, Bik showed off the original features of their kitchen, including its white Formica countertop, flecked with gold and black spots. “It’s random!” she assured me—no duplications. The same could not be said of the textured gray porcelain floor tiles. When workers installed them, Bik explained, she’d asked them to rotate the pieces that were identical, so that repeats would be less noticeable. A few duplicate tiles had ended up side-by-side anyway. I couldn’t see the duplication until she traced an identical wavy ridge in each tile with both of her index fingers. “Sorry—I’m, like, weird,” she said, and laughed.

In her bedroom closet, Bik’s shirts hung in a color gradient progressing from blacks and browns to greens and blues. Not long ago, she helped arrange her sister-in-law’s enormous shoe collection by color on new storage racks; when some friends complained about the messy boxes of nuts, screws, and nails that littered their garage, Bik sorted them into little drawers. “Nothing makes me more happy,” she told me. Since childhood, she has collected tortoise figurines and toys; around two thousand of them are arranged in four glass cabinets next to a blond-wood dining table. She keeps a spreadsheet tracking her turtle menagerie: there are turtles made from cowrie seashells, brass turtles, Delft blue porcelain turtles, bobble-headed turtles, turtle-shaped wood boxes with lids, and “functional” turtles (key chains, pencil sharpeners). She showed me a small stuffed animal with an eye missing: Turtle No. 1. (She has stopped adding to her collection. “I don’t want it to overtake my house,” she said.)

That afternoon, Bik settled at her dining table, which serves as her desk. Floor-to-ceiling windows offered a tranquil view of backyard foliage. On her curved widescreen monitor, Bik checked her Twitter account—her bio featured a photo of a cactus garden; “That’s me—prickly,” she said—and then pulled up her master spreadsheet of problematic papers, which she doesn’t share publicly. Each of its thousands of entries has more than twenty columns of details. She removed her glasses, set them next to a cup of chamomile tea, sat up straight, and began rapidly scanning papers from PLOS One with her face close to the monitor. Starting with the first study—about “leucine zipper transcription factor-like 1”—she peered at an array of Western-blot images. She took screenshots and scrutinized them in Preview, zooming in and adjusting the contrast and brightness. (Occasionally, she uses Forensically and ImageTwin, tools that do some semi-automated photo-forensics analysis.) She moved on to a study with pink and purple cross-sections of mouse-gut tissue, then stopped on a figure with a dozen photos of translucent clumps of cells. She chuckled. “It looks like a flying rabbit,” she said, pointing at one blob.

Bik found no problems. PLOS One has “cleaned up their act a lot,” she said. The journal’s publisher employs a team of three editors who handle matters of publication ethics, including Bik’s cases. Renee Hoch, one of the editors, told me that the process of investigation, which entails obtaining original, raw images from the authors, and, in some cases, requesting input from external reviewers, usually takes four to six months per case. Hoch said that of the first hundred and ninety or so of Bik’s cases that the team had resolved, forty-six per cent required corrections, around forty-three per cent were retracted, and another nine per cent received “expressions of concern.” In only two of the resolved papers was nothing amiss. “In the vast majority of cases, when she raises an issue and we look into it, we agree with her assessment,” Hoch said.

Could Bik be replaced with a computer? There are arguments for the idea that automated image-scanning could be both faster and more accurate, with fewer false positives and false negatives. Hany Farid, a computer scientist and photo-forensic expert at the University of California, Berkeley, agreed that scientific misconduct is a troubling issue, but was uneasy about individual image detectives using their own judgment to publicly identify suspect images. “One wants to tread fairly lightly” when professional reputations are on the line, he told me. Farid’s reservations spring partly from a general skepticism about the accuracy of the human eye. While our visual systems excel at many tasks, such as recognizing faces, they aren’t always good at other kinds of visual discrimination. Farid sometimes provides court testimony in cases involving doctored images; his lab has designed algorithms for detecting faked photographs of everyday scenes, and they are eighty-to-ninety-five-per-cent accurate, with false positives in roughly one in a hundred cases. Judging by courtroom standards, he is unimpressed by Bik’s stats and would prefer a more rigorous assessment of her accuracy. “You can audit the algorithms,” Farid said. “You can’t audit her brain.” He would like to see similar systems designed and validated for identifying faked or altered scientific images.

A few commercial services currently offer specialized software for checking scientific images, but the programs aren’t designed for large-scale, automated use. Ideally, a program would extract images from a scientific paper, then rapidly check them against a huge database, detecting copies or manipulations. Last year, several major scientific publishers, including Elsevier, Springer Nature, and EMBO Press, convened a working group to flesh out how editors might use such systems to pre-screen manuscripts. Efforts are under way—some funded by the O.R.I.—to create powerful machine-learning algorithms to do the job. But it’s harder than one might think. Daniel Acuña, a computer scientist at Syracuse University, told me that such programs need to be trained on and tested against large data sets of published scientific images for which the “ground truth” is known: Doctored or not? A group in Berlin, funded by Elsevier, has been slowly building such a database, using images from retracted papers; some algorithm developers have also turned to Bik, who has shared her set of flawed papers with them.

Bik told me that she would welcome effective automated image-scanning systems, because they could find far more cases than she ever could. Still, even if an automated platform could identify problematic images, they would have to be reviewed by people. A computer can’t recognize when research images have been duplicated for appropriate reasons, such as for reference purposes. And, if bad images are already in the published record, someone must hound journal editors or institutions until they take action. Around forty thousand papers have received comments on PubPeer, and, for the vast majority, “there’s absolutely no response,” Boris Barbour, a neuroscientist in Paris who is a volunteer organizer for PubPeer, told me. “Even when somebody is clearly guilty of a career of cheating, it’s quite hard to see any justice done,” he said. “The scales are clearly tilted in the other direction.” Some journals are actively complicit in generating spurious papers; a former journal editor I spoke with described working at a highly profitable, low-tier publication that routinely accepted “unbelievably bad” manuscripts, which were riddled with plagiarism and blatantly faked images. Editors asked authors to supply alternative images, then published the studies after heavy editing. “I think what she’s showing is the tip of the iceberg,” the ex-editor said, of Bik.

Some university research-integrity officers point out, with chagrin, that whistle-blowing about research misconduct on social media can tip off the scientists involved, allowing them to destroy evidence ahead of an investigation. But Bik and other watchdogs find that posting to social media creates more pressure for journals and institutions to respond. Some observers worry that the airing of dirty laundry risks undermining public faith in science. Bik believes that most research is trustworthy, and regards her work as a necessary part of science’s self-correcting mechanism; universities, she told me, may be loath to investigate faculty members who bring in grant money, and publishers may hesitate to retract bad articles, since every cited paper increases a journal’s citation ranking. (In recent years, some researchers have also sued journals over retractions.) She is appalled at how editors routinely accept weak excuses for image manipulation—it’s like “the dog ate my homework,” she said. Last year, she tweeted about a study in which she’d found more than ten problematic images; the researchers supplied substitute images, and the paper received a correction. “Ugh,” she wrote. “It is like finding doping in the urine of an athlete who just won the race, and then accepting a clean urine sample 2 weeks later.”

Last year, Bik’s friend Jon Cousins, a software entrepreneur, made a computer game called Dupesy, inspired by her work. One night, after Thai takeout, we tried a beta version of the game at her computer. Bik’s husband went first, clicking a link titled “Cat Faces.”

A four-by-four panel of feline mugshots filled the screen. Some cats looked bug-eyed, others peeved. Instructions read, “Click the two unexpectedly similar images.” Gerard easily spotted the duplicates in the first few rounds, then hit a more challenging panel and sighed.

“I see it, I see it,” Bik sang quietly.

Finally, Gerard clicked the winning pair. He tried a few more Dupesy puzzle categories: a grid of rock-studded concrete walls, then “Coarse Fur,” “London Map,” and “Tokyo Buildings.”

When my turn came, I started with “Coffee Beans.” On one panel of dark-roasted beans, it took me thirty-one seconds to find the matching pair; on the next, six seconds. A few panels later, I was stuck. My eyes felt crossed. A nearby clock ticked loudly.

“Should I say when I see it?” Bik asked. “Or is that annoying?”

“No, no, no,” Gerard said.

“Just tell me when it’s annoying, because I don’t always know,” she said.

“Absolutely. You’re annoying,” he replied.

On her turn, Bik cruised swiftly through several rounds of “Coarse Fur,” then checked out other puzzle links. Some panels were “much harder than my normal work,” she said. The next day, Cousins e-mailed us with results: Bik’s median time for solving the puzzles was twelve seconds, versus about twenty seconds for her husband and me.

donate

Please disable Adblock!

Leave a Reply

Your email address will not be published. Required fields are marked *

%d bloggers like this: