Sloppy Use of Machine Learning is Causing a ‘Reproducibility Crisis’ in Science

Date:

Share:


Kapoor and Narayanan organized a workshop late last month to draw attention to what they call a “reproducibility crisis” in science that makes use of machine learning. They were hoping for 30 or so attendees but received registrations from over 1,500 people, a surprise that they say suggests issues with machine learning in science are widespread.

During the event, invited speakers recounted numerous examples of situations where AI had been misused, from fields including medicine and social science. Michael Roberts, a senior research associate at Cambridge University, discussed problems with dozens of papers claiming to use machine learning to fight Covid-19, including cases where data was skewed because it came from a variety of different imaging machines. Jessica Hullman, an associate professor at Northwestern University, compared problems with studies using machine learning to the phenomenon of major results in psychology proving impossible to replicate. In both cases, Hullman says, researchers are prone to using too little data, and misreading the statistical significance of results.

Momin Malik, a data scientist at the Mayo Clinic, was invited to speak about his own work tracking down problematic uses of machine learning in science. Besides common errors in implementation of the technique, he says, researchers sometimes apply machine learning when it is the wrong tool for the job.

Malik points to a prominent example of machine learning producing misleading results: Google Flu Trends, a tool developed by the search company in 2008 that aimed to use machine learning to identify flu outbreaks more quickly from logs of search queries typed by web users. Google won positive publicity for the project, but it failed spectacularly to predict the course of the 2013 flu season. An independent study would later conclude that the model had latched onto seasonal terms that have nothing to do with the prevalence of influenza. “You couldn’t just throw it all into a big machine-learning model and see what comes out,” Malik says.

Some workshop attendees say it may not be possible for all scientists to become masters in machine learning, especially given the complexity of some of the issues highlighted. Amy Winecoff, a data scientist at Princeton’s Center for Information Technology Policy, says that while it is important for scientists to learn good software engineering principles, master statistical techniques, and put time into maintaining data sets, this shouldn’t come at the expense of domain knowledge. “We do not, for example, want schizophrenia researchers knowing a lot about software engineering,” she says, but little about the causes of the disorder. Winecoff suggests more collaboration between scientists and computer scientists could help strike the right balance.

While misuse of machine learning in science is a problem in itself, it can also be seen as an indicator that similar issues are likely common in corporate or government AI projects that are less open to outside scrutiny.

Malik says he is most worried about the prospect of misapplied AI algorithms causing real-world consequences, such as unfairly denying someone medical care or unjustly advising against parole. “The general lesson is that it is not appropriate to approach everything with machine learning,” he says. “Despite the rhetoric, the hype, the successes and hopes, it is a limited approach.”

Kapoor of Princeton says it is vital that scientific communities start thinking about the issue. “Machine-learning-based science is still in its infancy,” he says. “But this is urgent—it can have really harmful, long-term consequences.”



Source link

━ more like this

Tesla’s Full Self-Driving is switching to a subscription-only service

Tesla will stop selling its $8,000 Full Self-Driving (FSD) option and make it strictly a monthly subscription service after February 14, CEO Elon...

How AI Companies Got Caught Up in US Military Efforts

At the start of 2024, Anthropic, Google, Meta, and OpenAI were united against military use of their AI tools. But over the next...

Meta’s VR gaming push is shrinking, and you’ll feel it

Meta has begun cutting more than 1,000 roles in Reality Labs, and the fallout is landing on the teams that made some of...

Navigating the London property market: Key legal steps every buyer must take – London Business News | Londonlovesbusiness.com

The London property market is fast-paced and competitive and includes a wide variety of properties from new-builds, serviced flats and modern apartment blocks...

Your Android 17 Quick Settings could get two big upgrades

Android 17 Quick Settings might finally get the kind of fix you feel every day. A new leak suggests Google is working on...
spot_img