Abstract: The design of proteins with specific functions has the potential to tackle biomedical, environmental, and industrial challenges in a biodegradable and cost-effective manner. Traditional protein design approaches have relied on finding the global energy minima of a multidimensional landscape defined with physicochemical based energy functions1. In this sense, we recently developed Fuzzle2,3, a database of reused protein fragments that Nature has reused over the course of evolution. These fragments are amenable for large-scale chimeragenesis with Protlego4, thus laying the background for designing novel functions by combining protein blocks in a Lego-like manner. Nevertheless, in recent years we are witnessing an explosion of Artificial Intelligence (AI) methods that are impacting virtually all areas of research and our daily lives. Natural Language Processing (NLP) is producing models capable of translating, understanding, and generating text with human capabilities. Given the many similarities between human languages and protein sequences5, using NLP methods for protein research opens a new unexploited door for protein design. Recently, inspired by the GPT-x language models, we trained ProtGPT2, a deep unsupervised language model that has learned the protein language upon being trained on the entire protein space6. ProtGPT2 is capable of generating protein sequences in unseen regions of the protein space while preserving natural-like properties. The inclusion of annotation tags during training will allow the directed generation of specific functions. Coupling the generation process to methods like high-throughput molecular dynamics7,8 will enable variant selection before experimentations. Recent developments in AI methods and their possible impact on protein design will be discussed.

Venue: Hybrid seminar: Sala d'Actes de la FiB and zoom, with required registration

Bioinfo4Women seminars / BSC Life Session

Venue: Barcelona

Date: 04/07/2022

Time: 12:00 CEST

Host: Alfonso Valencia

Towards controlled protein design with deep unsupervised models


Noelia Ferruz

Beatriu de Pinós Fellow at the Institute of Informatics and Applications at the University of Girona

Pin It on Pinterest