Researchers at the Department of Energy’s Lawrence Berkeley National Laboratory have created a new tool that adapts machine learning algorithms to the needs of synthetic biology, dramatically reducing the time spent engineering drugs.
With this new technology, scientists will not have to spend years developing a meticulous understanding of each part of a cell and what it does in order to manipulate it. With a limited set of training data, the algorithms can predict how changes in a cell’s DNA or biochemistry will affect its behaviour, then make recommendations for the next engineering cycle.
Hector Garcia Martin, a researcher in the Lawrence Berkeley National Laboratory’s Biological Systems and Engineering (BSE) Division, said: “The possibilities are revolutionary. Right now, bioengineering is a very slow process. It took 150 person-years to create the anti-malarial drug, artemisinin. If you are able to create new cells to specification in a couple weeks or months instead of years, you could really revolutionise what you can do with bioengineering.”
The ART algorithm
Working with BSE data scientist Tijana Radivojevic and an international group of researchers, the team developed and demonstrated a patent-pending algorithm called the Automated Recommendation Tool (ART). The algorithm is tailored to the particularities of the synthetic biology field: small training data sets, the need to quantify uncertainty, and recursive cycles. The tool’s capabilities were demonstrated with simulated and historical data from previous metabolic engineering projects.
The team used the ART to guide the metabolic engineering process to increase the production of tryptophan in a species of yeast called Saccharomyces cerevisiae. The project was led by Jie Zhang and Soren Petersen of the Novo Nordisk Foundation Center for Biosustainability at the Technical University of Denmark, in collaboration with scientists at Berkeley Lab and Teselagen, a San Francisco-based start-up company.
To conduct the experiment, they selected five genes, each controlled by different gene promoters and other mechanisms within the cell, representing nearly 8,000 potential combinations of biological pathways. The researchers then obtained experimental data on 250 of those pathways, representing just 3% of all possible combinations, and that data was used to train the algorithm. Using statistical inference, the tool was then able to extrapolate how each of the remaining 7,000-plus combinations would affect tryptophan production. The design it recommended increased tryptophan production by 106% over the reference strain and by 17% more than the best designs used for training the model.