Immunology researchers have launched a computational tool to improve pandemic preparedness by comparing data from different experiments. This algorithm uses machine learning to discover patterns in data sets to gain a deeper understanding of immune responses. It promises significant advances in vaccine design and immunological research, and has broad application potential in various biological contexts.
Computational biologists use machine learning to understand immune system data. Immune system researchers have designed a computational tool to improve pandemic preparedness. Scientists can use the new algorithm to compare data from vastly different experiments and better predict how individuals are likely to respond to disease.
"We are trying to understand how individuals defend themselves against different viruses, but the beauty of our approach is that you can generalize it to other biological settings, such as comparing different drugs or different cancer cell lines," said Tal Einav, Ph.D., assistant professor at the La Jolla Institute for Immunology (LJI) and co-lead of the new study in the journal Cell Reports Methods.
The work solves a major puzzle in medical research. Laboratories that study infectious diseases—even those studying the same viruses—collect vastly different kinds of data. "Each data set is an island of its own," Enav said.
Some researchers may study animal models, others may study human patients. Some labs focus on children, others collect samples from older adults who are immunocompromised. Location is also important. Cells taken from Australian patients may respond differently to the virus than cells taken from the German patient population, simply based on past exposure to the virus in those areas.
"The biology adds another layer of complexity. Viruses are always evolving, which also changes the data," Enav said. "Even if two labs test the same patients in the same year, their test results may be slightly different."
Unified calculation method
Enav worked closely with Dr. Rong Ma, a postdoctoral scholar at Stanford University, to develop an algorithm to help compare large data sets. He was inspired by his background in physics, a discipline in which scientists can be confident that the data will conform to the known laws of physics, no matter how innovative an experiment. E is always equal to mc2.
"As a physicist, what I like to do is bring everything together and find the unifying principle," Enav said.
The new computational method does not require precise knowledge of where or how each data set was obtained. Instead, Enaf and Ma used machine learning to determine which data sets followed the same basic patterns.
"You don't have to tell me that certain data comes from children, adults or teenagers. We just need to ask the machine, 'How similar are these data to each other?' and then combine similar data sets into a superset to train better algorithms," Enav said. Over time, these comparisons may reveal consistent principles in immune responses—principles that are difficult to find in immunology's vast and fragmented data sets.
Potential implications for vaccine design and immunology
For example, researchers can design better vaccines by understanding exactly how human antibodies target viral proteins. This is where biology gets complicated again. The problem is that humans can make about five million unique antibodies. Meanwhile, a viral protein may have more mutations than there are atoms in the universe.
"That's why people are collecting larger and larger data sets to try to explore the nearly infinite playground that is biology," Enav said.
But scientists don't have unlimited time, so they need ways to predict large amounts of data that they can't actually collect. Enav and Ma have shown that their new computational method can help scientists fill in these gaps. They demonstrated that their method of comparing large data sets could reveal countless new immunological rules, which could then be applied to other data sets to predict what missing data should look like.
The new method is also comprehensive enough to lend credibility to scientists' predictions. In statistics, a "confidence interval" is a way of quantifying how certain a scientist is about a prediction.
"These predictions work a bit like the Netflix algorithm, which predicts which movie you might enjoy watching," Enav said. "The Netflix algorithm looks for patterns in the movies you've chosen in the past. The more movies (or data) you add to these prediction tools, the more accurate the predictions will be. We could never collect that." to all the data, but we can do a lot with a small number of measurements. We can not only estimate the confidence of the prediction, but also tell you which further experiments will maximize the confidence. For me, the real victory has always been to gain a deeper understanding of a biological system, and this framework aims to do exactly that."
Future directions and cooperation
Enav recently joined the LJI faculty after completing postdoctoral training in the laboratory of Dr. Jesse Bloom at the Fred Hatch Cancer Center. As he continues his work at LJI, he plans to focus on using computational tools to learn more about human immune responses to a number of viruses, starting with influenza. He looks forward to collaborating with LJI's top immunologists and data scientists, including Professor Bjoern Peters (PhD), who is also a physicist by training.
"When you have people from different backgrounds, you get great synergies. With the right team, solving these big open problems is finally possible," Enav said.
Compiled source: ScitechDaily