Case Study: Applying AI to discover novel chaperones & increase biological production
Jun 07, 2022
We applied our Denoviumโข Engine AI models to discover a novel molecular chaperone & optimize protein production. Learn how with Absci VP of AI Research Greg Hannum in this short video.
—
Gregory Hannum:
I’m Gregory Hannum, VP of AI Research. I co-founded the AI company Denoviumโข and lead the AI team here at Absci. In this video, I’m going to talk about the history of Absci’s AI engine and provide a practical example of how we employed AI technology to identify novel chaperones, including one that helped double the production titer of a hard to produce protein. Interestingly, this protein sequence wasn’t characterized as a chaperone in the public databases and had less than 24% sequence homology to any of the canonical chaperones. The AI engine described here aims to predict functionality of any protein sequence instead of just structure and in this case study I describe a powerful application of such a technology.
Gregory Hannum:
While discovering a novel therapeutic can be groundbreaking, manufacturing the biologic drug is often a big barrier to bringing it to market, especially for the next-gen modalities which don’t exist in nature. Producing novel proteins at high titers with high quality is a crucial challenge that has long been the expertise of Absci.
Gregory Hannum:
Our SoluProยฎ E. coli strains can deliver on both titers and quality, though it often requires that the drug candidate is co-expressed with one or more appropriate chaperones to assist with expression and folding. Looking back about a year, Absci had a collection of chaperones that seemed to perform well, and these were identified using traditional sequence homology approaches such as BLAST. However, these chaperones sometimes fell short of our expectations for certain hard to express proteins. We knew there was a viable opportunity to uncover novel chaperones. So, the question became one of screening all appropriate proteins and characterizing what we call the “chaperone universe.”
Gregory Hannum:
Tasked with this effort was Denoviumโข, an artificial intelligence company I co-founded. For three years, we built AI solutions focused on biological sequence and functional data, including DNA and proteins. In fact, the name Denoviumโข came from the vision of designing proteins completely from scratch, i.e., de novo, using artificial intelligence.
Gregory Hannum:
Most relevant to this case study was our deep learning model of protein function, which is the most comprehensive model for determining a protein’s function directly from a sequence. It was trained on a massive set of more than 100 million proteins, each of which was annotated with up to 30 distinct functional tasks and more than 700,000 functional labels. This includes functional ontologies, sequence homologies, structural information, enzymatic activity, taxonomy, transmembrane regions, signal peptides, subcellular location, and much more. It allows real time annotation of protein sequences including those with unknown function and even those with no known sequence homologs. And thanks to the speed of deep learning inference, we are able to use this model to re-annotate the entirety of the UniProt protein database over the course of a weekend on consumer grade hardware, essentially distilling and enhancing decades of academic research.
Gregory Hannum:
A key feature of our AI protein model is the transformation of protein sequence data into high dimensional, functional representation, or embedding. You can think of embedding as a numeric vector, which represents a summary of a protein’s collective function. These embeddings serve as a mathematically powerful tool for replacing bunch of traditional bioinformatics approaches, while also greatly increasing the power to generalize the novel proteins. An important example of this is the ability to search for novel functional homologs. This is done by first organizing the entire protein universe into the functional embedding space. The picture on the top right shows a 3D representation of this for a half a million distinct proteins. Once indexed proteins of interest can be used as search queries to find novel functional homologs by searching for neighbors in the functional space. This is similar to how other state-of-the art search engines operate.
Gregory Hannum:
Our AI-based search technology was used for the purpose of characterizing the chaperone universe to identify useful proteins to co-express in the SoluProยฎ cell line and achieve higher titers and quality. We did this by using a comprehensive list of known chaperones as in the functional homolog search. The resulting candidates were too many to synthesize and test individually. So, a model was tasked with organizing the candidate proteins into 1000 distinct functional groups. The most representative protein from each group was then synthesized for laboratory validation. We knew this approach of organizing hits functionally would better cover the test space and have a much higher chance of success than traditional ranking approaches.
Gregory Hannum:
We use our proprietary ACE assay to screen chaperone candidates for the production of very difficult to express Fab. We soon found an exciting hit we named XYZ. It was a punitive alcohol hydroperoxide reductase C originally discovered in a root bacterium and was not previously considered a chaperone using traditional screening approaches. When co-expressed with our protein of interest, it was found to nearly double the titer and improve the quality of the product meeting a key milestone for our partner. This case study serves as an exciting practical demonstration towards a vision of deploying AI for proteins and strain engineering.
We applied our Denoviumโข Engine AI models to discover a novel molecular chaperone & optimize protein production. Learn how with Absci VP of AI Research Greg Hannum in this short video.
—
Gregory Hannum:
I’m Gregory Hannum, VP of AI Research. I co-founded the AI company Denoviumโข and lead the AI team here at Absci. In this video, I’m going to talk about the history of Absci’s AI engine and provide a practical example of how we employed AI technology to identify novel chaperones, including one that helped double the production titer of a hard to produce protein. Interestingly, this protein sequence wasn’t characterized as a chaperone in the public databases and had less than 24% sequence homology to any of the canonical chaperones. The AI engine described here aims to predict functionality of any protein sequence instead of just structure and in this case study I describe a powerful application of such a technology.
Gregory Hannum:
While discovering a novel therapeutic can be groundbreaking, manufacturing the biologic drug is often a big barrier to bringing it to market, especially for the next-gen modalities which don’t exist in nature. Producing novel proteins at high titers with high quality is a crucial challenge that has long been the expertise of Absci.
Gregory Hannum:
Our SoluProยฎ E. coli strains can deliver on both titers and quality, though it often requires that the drug candidate is co-expressed with one or more appropriate chaperones to assist with expression and folding. Looking back about a year, Absci had a collection of chaperones that seemed to perform well, and these were identified using traditional sequence homology approaches such as BLAST. However, these chaperones sometimes fell short of our expectations for certain hard to express proteins. We knew there was a viable opportunity to uncover novel chaperones. So, the question became one of screening all appropriate proteins and characterizing what we call the “chaperone universe.”
Gregory Hannum:
Tasked with this effort was Denoviumโข, an artificial intelligence company I co-founded. For three years, we built AI solutions focused on biological sequence and functional data, including DNA and proteins. In fact, the name Denoviumโข came from the vision of designing proteins completely from scratch, i.e., de novo, using artificial intelligence.
Gregory Hannum:
Most relevant to this case study was our deep learning model of protein function, which is the most comprehensive model for determining a protein’s function directly from a sequence. It was trained on a massive set of more than 100 million proteins, each of which was annotated with up to 30 distinct functional tasks and more than 700,000 functional labels. This includes functional ontologies, sequence homologies, structural information, enzymatic activity, taxonomy, transmembrane regions, signal peptides, subcellular location, and much more. It allows real time annotation of protein sequences including those with unknown function and even those with no known sequence homologs. And thanks to the speed of deep learning inference, we are able to use this model to re-annotate the entirety of the UniProt protein database over the course of a weekend on consumer grade hardware, essentially distilling and enhancing decades of academic research.
Gregory Hannum:
A key feature of our AI protein model is the transformation of protein sequence data into high dimensional, functional representation, or embedding. You can think of embedding as a numeric vector, which represents a summary of a protein’s collective function. These embeddings serve as a mathematically powerful tool for replacing bunch of traditional bioinformatics approaches, while also greatly increasing the power to generalize the novel proteins. An important example of this is the ability to search for novel functional homologs. This is done by first organizing the entire protein universe into the functional embedding space. The picture on the top right shows a 3D representation of this for a half a million distinct proteins. Once indexed proteins of interest can be used as search queries to find novel functional homologs by searching for neighbors in the functional space. This is similar to how other state-of-the art search engines operate.
Gregory Hannum:
Our AI-based search technology was used for the purpose of characterizing the chaperone universe to identify useful proteins to co-express in the SoluProยฎ cell line and achieve higher titers and quality. We did this by using a comprehensive list of known chaperones as in the functional homolog search. The resulting candidates were too many to synthesize and test individually. So, a model was tasked with organizing the candidate proteins into 1000 distinct functional groups. The most representative protein from each group was then synthesized for laboratory validation. We knew this approach of organizing hits functionally would better cover the test space and have a much higher chance of success than traditional ranking approaches.
Gregory Hannum:
We use our proprietary ACE assay to screen chaperone candidates for the production of very difficult to express Fab. We soon found an exciting hit we named XYZ. It was a punitive alcohol hydroperoxide reductase C originally discovered in a root bacterium and was not previously considered a chaperone using traditional screening approaches. When co-expressed with our protein of interest, it was found to nearly double the titer and improve the quality of the product meeting a key milestone for our partner. This case study serves as an exciting practical demonstration towards a vision of deploying AI for proteins and strain engineering.