Published by the Students of Johns Hopkins since 1896
November 21, 2024

Addressing the replication crisis in computer science

By SHIRLENE JOHN | February 21, 2024

4466482623_cf7a5c483b_o

MICHAEL HIMBEAULT / CC BY 2.0

Jessica Sorrell highlighted the different reasons why machine learning algorithms might be difficult to replicate.

The Department of Computer Science hosted Jessica Sorrell, a postdoctoral researcher at the University of Pennsylvania’s School of Engineering and Applied Science, for its seminar series on Feb. 15. In her talk, titled “Replicability in Machine Learning,” Sorrell examined a new approach to formalize a definition of replicability for machine learning algorithms.

Replicability — the ability for a research study to be repeated and obtain the same results — is an important process in validating scientific experiments. Sorrell highlighted how the replication crisis of the 2010s brought into the spotlight how researchers conducted scientific experiments. She quoted a 2016 study that surveyed 1,576 researchers in the natural sciences: 70% of scientists were not able to replicate another researcher’s experiment, and more than half were not able to replicate their own.

“We might hope that, as computer scientists and machine learning researchers, we're running our experiments in very controlled environments,” she said. “But there's increasing concern about replication within machine learning as well.”

Sorrell then explained where difficulties may arise in replicating machine learning studies. Often, source codes may not be publicized, so a replication effort must also include reverse engineering the experimental code. There is also increasingly disparate access to computing resources. 

“Frequently, we'll see experiments published by groups like OpenAI or Google that have access to computing resources that the vast majority of junior learning research labs certainly do not have. It's just impossible to then even conduct this experiment,” Sorrell stated. 

There are also privacy concerns emerging from sharing data that may include personal identifiable information. Sorrell, however, decided to focus on the issue of ambiguity and what can count as replication. 

She gave an example from a reinforcement learning setting, where an agent has to learn to walk. One experiment created an upright figure that was balanced on two legs. The other experiment produced the same figure, but it was slightly unstable and not fully upright. She questioned whether the second figure can even count as a measure of replicability. Sorrell’s research then tried to formalize a universal understanding of replicability.

“When we talk about replicability, what we mean is that another team of researchers should be able to obtain the same results that are published in a particular paper using their own data,” she said. “When trying to formalize what it means for an algorithm to be replicable, we started with this idea in mind.”

Sorrell then presented the algorithm that she developed, in conjunction with several other collaborators, to define reproducibility. 

She started by examining a randomized algorithm, a technique that uses randomness for training a machine learning model. This algorithm is replicable for any distribution, if you can draw two samples from the data and feed that to the algorithm as an input for two independent runs. 

“Then I give my algorithm randomness,” she said. “We're then fixing the randomness between both fronts, but we’re resampling the input data. We'll say that the [algorithm] is replicable with high probability if we do this, and we get the exact same output.”

According to Sorrel, this work has been able to create replicable algorithms for several statistical analyses, including finding heavy hitters or approximating medians of distributions. She also highlighted that she received a lot of inspiration from the literature on differential privacy. 

Differential privacy is a mathematical framework designed to assess the potential extent of data leakage when an individual contributes their data to a dataset. She explained the major difference between replicability and differential privacy. 

“For differential privacy, we're not saying anything about a distribution or where this data is coming from,” Sorrell said. “Whereas for replicability, we're making a distributional assumption because we're saying something about two samples drawn from the same distributions.”

While replicability and differential privacy do not mean the same thing on paper, differential privacy’s logical framework and background theory can help researchers better understand how to approach questions of replicability.

Sorrell concluded her talk by detailing her interests in machine learning more broadly, especially in areas of privacy and fairness. She highlighted how many current machine learning models demonstrate systematic biases. One example includes underdiagnosis bias in artificial intelligence algorithms that are used to classify chest radiographs. 

“[Something] troubling is the use of predictive models in child welfare cases. There’s this tool that's called the Allegheny Family Screening Tool, [which is] used to guide decision-making around when to follow up with children's child welfare cases. It's intended to predict how likely it is that a child needs to be removed from the home,” she said. “It's being questioned by the Department of Justice because they've shown that the tool might be biased against folks of color and disabled folks.” 

In the future, Sorrell hopes her research can help develop tools that allow existing regulatory bodies to investigate and audit machine learning models. 

“I'm interested in questions such as what notions of algorithmic fairness actually enable efficient audits and repair,” she said. “I also want to understand when we can allow our regulatory bodies to propose changes to models without potentially making these models exploitable.” 

Ama Koranteng, a fourth-year doctoral student in Computer Science, said she attended the talk to learn more about how Sorrell was studying replicability from a theoretical lens. 

“One of my big takeaways is there are a lot of questions that need to be answered about the replicability of these [machine learning] algorithms,” she said in an interview with The News-Letter. “I was almost intimidated by the number of issues that seemed to exist, but it's cool seeing that there are theoretically based approaches to answering these questions and ensuring their workability.”


Have a tip or story idea?
Let us know!