
IMAGE: Researchers used DNA sequences from high-resolution experiments to train a neural network called BPNet, and their “black box” operations were then discovered to reveal the order patterns and organizational principles of the … vision. more
Credit: Image courtesy of Mark Miller, Stowers Institute for Medical Research.
KANSAS CITY, MO – Researchers at the Stowers Institute for Medical Research, in collaboration with colleagues at Stanford University and the Technical University of Munich have defined advanced artificial intelligence (AI) in a tour de technical force to guide management which is encoded in DNA. In a report published online February 18, 2021, in The genetics of nature, the team found that a neural network trained on high-resolution maps of protein-DNA interactions can detect subtle DNA order patterns throughout the genome and provide a deeper understanding of how the sequences that are organized to regulate genes.
Natural networks are powerful AI models that learn complex patterns from different types of data such as images, speech signals, or text to predict related features with incredible precision. However, many see these models as incomprehensible because the learned prediction patterns are difficult to extract from the model. This black box nature has prevented the deployment of cloud networks in biology, where the definition of predictive patterns is very important.
One of the major unsolved problems in biology is the second code of the genome – its regulatory code. DNA bases (usually represented by the letters A, C, G, and T) encode not only the instructions on how to build proteins, but also when and where you make these proteins. the organism. The regulatory code is read by proteins called transcription factors that bind to short pieces of DNA called motifs. However, how a particular combination and arrangement of motifs characterizes regulatory activity is a very complex problem that has been difficult to detect.
Now, an interdisciplinary team of biologists and computer researchers led by Stowers Investigator Julia Zeitlinger, PhD, and Anshul Kundaje, PhD, from Stanford University, have designed a neural network – named BPNet for Base Pair Network – can be explained to appear. control code by predicting binding factor binding from DNA sequences with unprecedented accuracy. The key was DNA transcription factor binding assays and computer modeling at the highest resolution, down to the level of individual DNA bases. This greater intention allowed them to develop new interpretive tools to master the key patterns of a series of elements such as the binding motifs of transcription factors and the combinatorial rules by which motifs work together as a control code.
“This was very satisfying,” Zeitlinger said, “because the results fit perfectly with the existing experimental results, and also reveal new perspectives that surprised us.”
For example, the neural network models allowed the researchers to find an amazing rule that governs a link of the well-studied transcription factor called Nanog. They found that Nanog interacts with DNA when multiples of its motif are occasionally present until they appear on the same side of the spinning DNA helix.
“There has long been evidence of experimental evidence that there is such a time in a regulatory code,” Zeitlinger says. “However, the exact scenarios were not difficult, and Nanog had not been under suspicion. Finding out that Nanog has such a pattern, and seeing more details about the -his interactions, not surprising because we did not specifically study this pattern. ”
“This is the main benefit of using cloud networks for this task,” he says? Iga Avsec, PhD, first author of the paper. Avsec and Kundaje created the first version of the model when Avsec visited Stanford during his doctoral studies in the laboratory of Julien Gagneur, PhD, at the Technical University in Munich, Germany.
“More traditional bioinformatics handle model data using predefined strict rules based on existing knowledge. However, biology is extremely rich and complex,” Avsec says. “By using cloud networks, we can train much more flexible and advanced models that learn complex patterns from scratch with no prior knowledge, thus allowing new discoveries.”
The BPNet network architecture is similar to the cloud network architecture used for face recognition in images. For example, the neural network first finds edges in the pixels, then learns how edges form facial elements such as the eye, nose or mouth, and finally discover how face elements together form a face. Instead of learning from pixels, BPNet learns from the raw DNA sequence and learns to find order motifs and finally the higher order rules by which the elements predict the secret-binding data. foundation.
Once the model is trained to be very accurate, the learned patterns are extracted with interpretive tools. The output signal is followed back to the entry lines to reveal a series motif. The final step is to use the model as an oracle and systematically interrogate it with the design of a specific DNA sequence, similar to what one would do to test an hypothesis experimentally, to reveal the rules by which a motif series works in a balanced way.
“The beauty is that the model can predict more order designs that we could test in experiments,” Zeitlinger says. “Furthermore, by predicting the outcome of experimental collisions, we can identify the most informative experiments to test the model.” In fact, with the help of CRISPR gene editing techniques, the researchers confirmed experimentally the model prediction was very accurate.
Because the approach is flexible and relevant to a combination of different data types and cell types, it promises to lead to a rapidly growing understanding of the regulatory code and how it is affected by genetic differences. gene regulation. Both the Zeitlinger Lab and the Kundaje Lab are already using BPNet to identify binding motifs for other cell types, linking motifs to biochemical parameters, and learning other structural features in the genome such as those is related to DNA packing. To allow other scientists to use and customize BPNet for their own needs, the researchers have provided the entire software framework with documentation and classes.
###
Other contributors to the study included Melanie Weilert, Sabrina Krueger, PhD, Khyati Dalal, Robin Fropf, PhD, and Charles McAnany, PhD, from Stowers; and Avanti Shrikumar, PhD, and Amr Alexandari from Stanford University.
This work was supported in part by the Stowers Institute for Medical Research and the National Human Genome Research Institute (awards R01HG009674 and U01HG009431 to AK and R01HG010211 to JZ) and the National Institute of General Medical Sciences (DP2GM123485 to AK) of the National Institutes of Health (NIH) . Additional support included the German Bundesministerium für Bildung und Forschung (MechML project 01IS18053F to ZA) and the Stanford BioX Fellowship and the Howard Hughes Medical Institute International Student Research Fellowship (to AS). A sequence was performed at the Stowers Institute for Medical Research and Core Genomics of the University of Kansas Medical Center with support from NIH awards from the National Institute of Child Health and Human Development (U54HD090216), Office of the Director (Instrument S10OD021743), and National Institute of General Medical Sciences (COBRE). P30GM122731). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.
Lay Summary of Conclusions
DNA is famous for encoding proteins. There is also another code – a control code – that guides when and where proteins should be made in an organism. In a report published online February 18, 2021, in The genetics of nature, laboratory researchers from Julia Zeitlinger, PhD, Researcher at the Stowers Institute for Medical Research, and colleagues from Stanford University and Munich Technical University describe how they used artificial intelligence to help determine the genome’s regulatory code.
The researchers developed a neural network that can be traced internally to reveal patterns of regulatory DNA order and their high-level organizational principles from high-resolution genomics data. The Zeitlinger Lab expects that the prediction models, rules, and maps created using this type of approach will lead to a better understanding of natural genetic differentiation and comorbid disease. related to regulatory diseases in DNA.
About the Stowers Institute for Medical Research
Founded in 1994 through the generosity of Jim Stowers, founder of American Century Investments, and his wife, Virginia, the Stowers Institute for Medical Research is a nonprofit, biomedical research organization with a focus on institutional research. Its mission is to broaden our understanding of the intentions of life and to improve quality of life through innovative approaches to the causes, treatment and prevention of disease.
The Institute is comprised of twenty independent research programs. Of the approximately 500 members, more than 370 are scientific staff that includes principal investigators, technology center leaders, graduate scientists, graduate students, and technical support staff. Learn more about the Institute at http: // www.