Billions of dollars are spent on medical megaprojects that collect vast stores of data and yet, there is a limitation to making sense of all the information - scientists' abilities.

There is the White House's Cancer Moonshot, seeking to cut a decade's worth of progress in cancer research into half the time, the Precision Medicine Initiative, which tries to pry through the health of a million Americans to pick out a few hints about health and disease, and the International Human Cell Atlas Initiative, identifying and describing all human cell types.

"It's not just that any one data repository is growing exponentially, the number of data repositories is growing exponentially," said Dr. Atul Butte, who leads the Institute for Computational Health Sciences at the University of California.

The federal government is also pushing doctors and hospitals to shift to electronic recording - costing more than USD28 billion alone in federal incentives to hospitals, doctors and others to adopt to the system.

These investments create vast data repositories that can be mined for clues about health and disease - much like the way websites data mine an individual's preferences to generate personalised ads. However, data scientists at Google and Facebook have an algorithm to systematically analyse the information in their records. Medical researchers do not.

Not an easy task with 2.5 million scientific papers published annually

Sifting through the data for hints about health and disease is not easy and the raw data is neither robust nor reliable. Different digital platforms are also a problem as the electronic medical records are not compatible with one another.

Some of the details that could potentially be a breakthrough could have been kept as freeform notes, which are hard to extract and interpret. Errors are also common in these records. Data from scientific studies are not entirely trustworthy either.

"So many articles that are published today are going to be wrong in 10 years," said Greg Simon, who leads the Cancer Moonshot. "That's just the history of scientific research, and the question is you just don't know which ones are going to be wrong."

Scientists are desperately trying to figure out how to analyse the vast amount of data available - 2.5 million scientific papers are published annually. This also poses a problem for doctors and other healthcare professionals. They would either miss vital information or be exposed to false information.

"In a world when anything is possible because you have so much data, how do you figure out who has done the math right?" asked Food and Drug Administration Commissioner Robert Califf.

AI a solution to the science overload?

Common sites such as MedCalc and UptoDate are used by doctors to consult diagnostic criteria and confirm certain treatment guidelines. But the science overload needs to be addressed and some believe that artificial intelligence (AI) can be a solution.

Machine learning assistants can be programmed to read incoming papers, distil information and highlight any relevant findings. In October, software company Iris developed the first version of such an assistant. Its machine reads an abstract of a paper, maps out the key concepts and finds similar papers that are relevant to those concepts.

It provides a quick way to conceptualise a scientific landscape for a particular topic, especially when the exact keywords of the research do not come to mind. It allows easier navigation through literature, especially for scientists doing interdisciplinary research.

“One of the problems is getting research out of the dusty digital drawers and into the hands of people who can implement it,” says Anita Schjøll Brede, the CEO of Iris.

In the near future, the company plans to develop a proactive version that records the papers an individual read and provides new ones based on the project description. In a decade, Brede hopes the AI will be powerful enough to discover new concepts through its reading and understanding of the literature, all on its own - a literal assistant.

Other companies such as IBM is using its AI technology to home in on the problem in medicine, taking on the field of cancer treatment with Watson for Oncology, which draws information from papers, patient data and clinical trials to help oncologists to keep up with the latest developments in the field.

Better teamwork between hospitals and technologists needed first

But that is just it; IBM's AI does not expand to other medical fields and Iris' machine is just a digital librarian and for doctors, just finding the proper research is not enough - interpretation of the literature is needed.

“This is a huge problem,” says Setareh Alipour, a medical resident in New York. “Scientific data is becoming so vast that even specialised doctors can’t know everything that is being discovered about their field. And I’m talking about larger studies, not small and unreliable data.”

Perhaps future an AI physician's assistant could plug into a universally accessible electronic health record, cross reference a patient's symptoms and medical history with the most-up-to-date recommendations to guide treatment choice. But a common limitation is that the people who develop AI do not understand hospitals. So maybe, before we go too far, the collaboration between hospitals and technologists needs to be improved before an AI physician's assistant can be fully developed. MIMS

Read more:
The battle of doctors vs. machines in clinical diagnoses
How to protect your patients’ medical data from visual hacking
AI known as CRyPTIC can diagnose drug-resistant tuberculosis in minutes
The challenge of patients over-relying on medical websites and how to advise them