Virtual BSC RS/Life Session: A Multi-Objective Genetic Algorithm to Find Active Modules in Multiplex Biological Networks (MOGAMUN) and Sex differences in genetic architecture in UK Biobank

Date: 08 Jul 2021 Time: 12:00

Title: A Multi-Objective Genetic Algorithm to Find Active Modules in Multiplex Biological Networks

Speaker: Elva Novoa, Postdoc at Toxalim (Research Centre in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, 31300 Toulouse, France.

Abstract: One of the most challenging tasks in computational biology is the integration of complementary biological data produced from different experimental sources. Our goal here is to combine expression data and biological networks to identify “active modules”, i.e. subnetworks of interacting genes/proteins associated with expression changes in different biological contexts. We developed MOGAMUN, a multi-objective genetic algorithm that finds dense subnetworks with an overall deregulation. We compared the performance of MOGAMUN with 3 state-of-the-art methods (jActiveModules [3],COSINE [4] and PinnacleZ [5]), on simulated expression datasets, where MOGAMUN showed the best performances. We also applied MOGAMUN to identify active modules for a rare monogenic disease, Facioscapulohumeral muscular dystrophy (FSHD). We found active modules that represent both known and new cellular processes associated with the hallmarks of the FSHD disorder. MOGAMUN is available as a Bioconductor package.

Title: Sex differences in genetic architecture in UK Biobank

Speaker: Elena Bernabéu, PhD student at The Roslin Institute, University of Edinburgh.

Abstract: Sex is arguably the most important differentiating characteristic in most mammalian species, separating populations into different groups, with varying behaviors, morphologies, and physiologies based on their complement of sex chromosomes, amongst other factors. In humans, despite males and females sharing nearly identical genomes, there are differences between the sexes in complex traits and in the risk of a wide array of diseases. Gene by sex interactions (GxS) are thought to account for some of these differences. However, the extent and basis of these interactions are poorly understood.

Here we provide insights into both the scope and mechanism of GxS across the genome of circa 450,000 individuals of European ancestry and 530 complex traits in the UK Biobank. We found small yet widespread differences in genetic architecture across traits through the calculation of sex-specific heritability, genetic correlations, and sex-stratified genome-wide association studies (GWAS). We also found that, in some cases, sex-agnostic GWAS efforts might be missing loci of interest, and looked into possible improvements in the prediction of high-level phenotypes. Finally, we studied the potential functional role of the differences observed through sex-biased eQTL and gene-level analyses.

This study marks a broad examination of the genetics of sex differences. Our findings parallel previous reports, suggesting the presence of sex genetic heterogeneity across complex traits of generally modest magnitude. Our results suggest the need to consider sex-stratified analyses for future studies to shed light into possible sex-specific molecular mechanisms.

Virtual BSC RS/Life Session: Harnessing Conformational Dynamics to Engineer New Enzymes

Date: 13 May 2021 Time: 15:00

Speaker: Lynn Kamerlin, Department of Chemistry – BMC, Uppsala University

Abstract: Understanding how new enzyme functions evolve, either on existing scaffolds, or completely de novo on previously non-catalytic scaffolds, is of great interest both from a fundamental biochemistry perspective, and from a biotechnological perspective. Several hypotheses have been put forward to rationalize enzyme evolution, one of which is that their conformational dynamics plays an important role in facilitating the emergence of new enzyme functions1-3. My team and I have invested substantial research effort into understanding enzyme multifunctionality in catalytically promiscuous enzymes4-8, as well as the structure-function-dynamics relationships shaping the evolution of new enzyme functions, in both natural and engineered active sites9-13. In this talk, I will discuss recent progress in this area, and illustrate how we have engineered conformational dynamics to generate a a de novo active site capable of catalysing a non-natural reaction,10 and then subsequently enhanced this activity using a simple computational approach, reaching catalytic efficiency comparable to that of naturally occurring enzymes.

BSC RS/BSC Life Session: Network-based approaches for examining disease and developmental processes

Date: 29/Apr/2021 Time: 16:00

Speaker: Sushmita Roy, Associate Professor at the Biostatistics and Medical Informatics Department and a faculty at the Wisconsin Institute for Discovery, University of Wisconsin, Madison

Abstract:Central to how living systems function are molecular networks defining connections among different types of components such as mRNA, proteins and metabolites. Network-based approaches offer a powerful suite of tools to understand different disease and normal processes and can be grouped into two main classes: (a) methods for network reconstruction that aim to infer the structure of the network, (b) methods for network-based interpretation that use a network as a backbone for integrating and interpreting high-throughput omic datasets. In the first part of this talk, I will present some recent work from our group for the “network reconstruction” problem in the context of mammalian gene regulatory networks. Genome-scale regulatory network inference is a long-standing problem in gene regulation and is a key ingredient for building predictive models of organism state. I will present computational methods used to infer genome-scale regulatory networks by integrating publicly available gene expression datasets with other auxiliary datasets that provide prior support for a regulatory connection. Using our approaches we have inferred regulatory networks for early mammalian development and have used these networks to prioritize important regulatory nodes and edges that we experimentally validated. In the second part of the talk I will present network-based approaches for understanding three-dimensional organization of the genome and its role in phenotypic variation. I will present some case studies of how these approaches can be used to study genome organization in cancer as well as link regulatory variants identified in different genome-wide association studies to downstream pathways.

Virtual BSC RS/BSC Life Session: Enter the matrix: modeling tumor cell and immune cell interactions at the single-cell resolution

Date: 18/Feb/2021 Time: 15:00

Speaker: Elana J. Fertig, Associate Professor of Oncology, Assistant Director of @HopkinsRPQS SKCCC,  Johns Hopkins

Abstract:Tumors employ complex, multi-scale cellular and molecular interactions that evolve over the course of therapeutic response. The changes in these pathways enables tumors to overcome therapeutic regimens, and ultimately acquire resistance. New molecular profiling technologies, including notably single cell technologies, provide an unprecedented opportunity to characterize these molecular relationships. However, interpreting the specific cellular and molecular pathways in therapeutic response requires complementary computational analysis methods. We developed an unsupervised learning method, CoGAPS, that employs Bayesian non-negative matrix factorization to disentangle distinct biological processes from high-throughput molecular data. Notably, this algorithm discovers dynamic compensatory signaling in acquired therapeutic resistance from time course bulk RNA-seq data and novel NK cell activation in anti-CTLA4 response from post-treatment scRNA-seq data. To further demonstrate that the inferred pathways are biological rather than computational artifacts, we developed a complementary transfer learning method to relate learned patterns between datasets. We demonstrate that this approach identifies robust molecular processes between model systems and human tumors and enables multi-platform data integration to delineate the drivers of therapeutic response and resistance.

Virtual BSC RS/BSC Life Session: Block by block: building a data science infrastructure for cancer research on the cloud

Date: 04/Feb/2021 Time: 15:00

Speaker: Dr. Christina Yung leads a team of software engineers, infrastructure specialists and bioinformaticians at OICR to build tools that empower and accelerate cancer research discoveries.

Abstract: With decreasing cost of genomic sequencing, cancer research groups are generating large multi-omics and single cell data for various cancer types along with histopathology and radiological images. Researchers face major challenges in secure data management, efficient data analysis and responsible data sharing. At the Ontario Institute for Cancer Research (OICR), the Genome Informatics team has developed such software solutions for the International Cancer Genome Consortium (ICGC), and has made them available to researchers in an open-source software suite called Overture. In addition, we have built a compute cloud called the Cancer Genome Collaboratory enabling researchers to perform analyses on over one petabyte of cancer genomic data. We are continuing to build out the infrastructure of the genomics cloud in close collaboration with the cancer informatics community particularly in the ICGC Accelerating Research in Genomic Oncology (ARGO) initiative and the European-Canadian Cancer Network (EUCANCan).

Virtual BSC RS/BSC Life Session: Multi-omics data integration methods to study rare genetic diseases

Date: 21/Jan/2021 Time: 12:00

Speaker: Anaïs Baudot is the creator of the “Networks and Systems Biology for Diseases” team in the Marseille Medica Genetic Unit in 2018.

Abstract: The technological advances and accumulation of biomedical datasets are yielding unprecedented opportunities to better understand genetic diseases, but necessitate proper exploration and integration methods to unravel a complete picture of biological systems. I will discuss about the computational strategies we recently developed, using i) multilayer networks to integrate a large range of interactions, and associated exploration algorithms and ii) dimensionality reduction to extract biological knowledge simultaneously from multiple omics. On the application side, I will discuss about the analysis of rare genetic diseases, which raise various challenges: many patients are undiagnosed, phenotypes can be highly heterogeneous, and only a few treatments exist.

Place: Virtual seminar via Zoom, with required registration

Virtual BSC RS: A novel liquid biopsy platform utilizes gene-gene fusions for high-grade glioma patients

Date: 02 Dec 2020 Time: 11:00 (CET)

Speaker: Milana Frenkel-Morgenstern, Head of the “Cancer Genomics and BioComputing in Complex Diseases” group in the Azrieli faculty of Medicine, Bar-Ilan University

Abstract: GBM is characterized by intratumoral heterogeneity. Tumor heterogeneity, clonal diversity and mutation acquisition hamper the ability to tailor personalized therapy for GBM. Tumor sampling has limited ability to accurately capture the molecular landscape of the tumor and to disclose acquired molecular aberrations. Mutation analysis of cfDNA is a non-invasive procedure which may overcome these limitations as it may reflect the real composition of the tumor and track the molecular evolution. We sequenced cfDNA of GBM patients and assessed mutation patterns and fusion genes. METHODS: We collected blood and respective tumor samples from 27 GBM patients and blood samples from 14 healthy controls. Tumor DNA, cfDNA and WBC DNA were sequenced using deep sequencing procedures. The data were analyzed for detection of single nucleotide polymorphism (SNPs) and gene-gene fusions. RESULTS:GBM cfDNA concentrations were significantly elevated (median: 23.63 ng/mL; range 12.6–137) compared to healthy controls (median 2.06; range 1.68–7.62) (p < 0.0001). We identified unique SNPs in each glioma patient’s cfDNA and the corresponding tumor DNA including the top-10 most frequently mutated genes in GBM. For example, mutation of TP53

Virtual BSC RS 2020: The multidimensional problem of protein-protein interaction and protein phase separation: machine learning based solutions at the Bologna Biocomputing group

Date: 17 November, 2020 Time: 11:00 (CET)

Speaker: Rita Casadio, Honorary and Contract Professor at the Bologna University, Italy and Associate Researcher at IBIOM-CNR, Bari, Italy, the Italian central node of ELIXIR.

Abstract: In cells, the ensemble of billions of reactions in a living organism takes place in heterogeneous and crowded environments that influence the efficiency of the reactivity and the density distribution of participating macromolecules in biological processes and metabolic pathways. Besides the complexity of the inner membrane compartments in Eukaryotic cells, recent advancements in microscopy and liquid phase separation make it possible to highlight some dynamical aspects of open macromolecular assemblies, referred to as membraneless organelles that are common to several types of cells working under physiological conditions (1, 2). Results support the notion that condensation mechanisms are driven by collective protein-protein and protein-nucleic acid interactions, in dynamic equilibria with the surroundings and that phase separation phenomena may indeed link microscopic to mesoscopic structural and functional characteristics of the cell milieu. In this scenario, it is even more urgent to understand which proteins can undergo the single to droplet phase transition for describing and modelling the emergent properties of the complex cell interior. I will sum up our present source of information for protein-protein interactions and briefly describe the never-ending process of generating algorithms in our (ISPRED4, and other groups capable of extracting information from valuable data, with the aim of transferring knowledge by computing properties of never-seen before examples (3, 4). Finally, I will focus on the interesting finding that when considering the membraneless Cajal body proteins, predicted interaction patches well correlates with number of experimentally determined interactors when the interaction patches include residues with an inherent flexibility (4).

VIRTUAL BSC RS 2020: Unravelling the Electrocardiogram for Cardiovascular Risk Prediction

Date: 9 September, 2020 Time: 11.00 (CET)

Speaker: Julia Ramírez, Marie Curie Research Fellow Lecturer in Cardiovascular Data Science Queen Mary University of London (QMUL)

Abstract: Cardiovascular death is the main cause of mortality in developed countries. Current diagnosis and predictive tools are still insufficient due to low cost-accuracy ratios. The electrocardiogram (ECG) is a widely available and cheap tool that reflects the electrical activity of the heart. In this talk, I will give an introduction to the ECG, describe how it can be used to non-invasively quantify cardiovascular risk, and how genetic analyses can unravel key biological mechanisms reflected on the ECG.

Place: Zoom, with required registration

VIRTUAL BSC RS 2020: Gender Bias and Natural Language Processing

Date: 29th June 2020 Time: 11.30 (CEST)

Speaker: Marta R. Costa-jussà is a Ramon y Cajal Researcher at the Universitat Politècnica de Catalunya (UPC, Barcelona).

Abstract: Demographic biases are widely affecting artificial intelligence. In particular, gender bias is clearly spread in natural language processing applications, e.g. from stereotyped translations to poorer speech recognition for women than for men. In this talk, I am going to overview the research and challenges that are currently emerging towards fairer natural language processing in terms of gender.

SORS: Annotating clinical text produced in Chile

Date: 28/Feb/2020 Time: 11:00

Place: Sala d’actes de la FiB (Campus Nord)

Speaker: Dr. Jocelyn Dunstan, Center for Mathematical Modeling and Center for Medical Informatics, University of Chile

Abstract: Public hospitals in Chile have waiting lists for specialty consultations that are both numerous and with long waiting times. The reason for referral is in the form of unstructured text and therefore it is hard for authorities to know what diseases are being consulted. As a way to automate this process and potentiate the secondary use of information we have started to annotate these referrals with the following entities: clinical finding, abbreviations, body parts, medications and family member. This talk will present preliminary results of this process.

SORS: Systematic discovery of germline cancer predisposition genes through large-scale cancer genomics

Date: 21/Feb/2020 Time: 11:00

Place: Aula de Teleensenyament (B3 Building, Campus Nord)

Speaker: Solip Park, Computational Cancer Genomics Group Leader, CNIO

Abstract:The genetic causes of cancer include both somatic mutations and inherited germline variants. Large-scale tumor sequencing has revolutionized the identification of somatic driver alterations but has had limited impact on the identification of cancer predisposition genes (CPGs). Here we present a statistical method, ALFRED, that tests Knudson’s two-hit hypothesis to systematically identify CPGs from cancer genome data. Applied to ~10,000 tumor exomes the approach identifies known and putative CPGs – including the chromatin modifier NSD1 – that contribute to cancer through a combination of rare germline variants and somatic loss-of-heterozygosity (LOH). Rare germline variants in these genes contribute substantially to cancer risk, including to ~14% of ovarian carcinomas, ~7% of breast tumors, ~4% of uterine corpus endometrial carcinomas, and to a median of 2% of tumors across 17 cancer types.


SORS: Characterization of regulatory variants in promoters with enhancer activity and their relation with human diseases

Date: 26 November 2019 Time: 11:00

Speakers: Alejandra Medina, Junior Faculty at the International Laboratory for Human Genome Research at the National Autonomous University of Mexico

Venue: Sala d’actes de la FiB (Campus Nord)

Abstract: Gene regulation is driven by the interaction of regulatory sequences, commonly categorized as either enhancers or promoters. Recently, using a modification of the STARR-seq assay, we identified sets of promoters with enhancer potential. Given that the majority of genetic variants associated with human diseases and traits (93.7%) have been found to be located in non-coding DNA, in this follow up analysis we set out to characterize regulatory variants in ePromoters. Using genetic variants associated with traits and disease (GWAS catalog), we found a significant enrichment of GWAS variants associated to Hematological Measurements ePromoters found in HeLa.

We hypothesize that genetic variants within ePromoters are likely to affect transcription factor (TF) binding. Therefore, we aimed to identify the relevant TFs interacting with these regulatory regions and look for variants disrupting TF binding. Particularly, we found variants affecting binding of TFs associated to inflammatory response.

Understanding ePromoters and the regulatory mechanisms that affect their dual function will help identify the causes of human diseases and traits.

SORS: Data-driven approach to cardiovascular disease: Deep phenotyping, omics and machine learning

Date: 25 November 2019 Time: 09:30

Speakers: Dr. Sánchez-Cabo is the Head of the Bioinformatics Unit of CNIC.

Venue: Sala d’actes de la FiB (Campus Nord)

Abstract: Mostcardiovascular (CV) risk scores used in clinical practice predict the probability of CV events using information on the seven traditional cardiovascular risk factors: age, gender, hypertension, dyslipidemia, obesity, smoking and diabetes. These scores, however, fail to identify young, healthy individuals potentially at risk based on their extension or progression of subclinical atherosclerosis, mainly characterized using imaging techniques. By means of deep phenotyping and omics data analyzed with machine learning methods we aim to develop new risk scores to refine the prediction of 10-year cardiovascular risk in young, asymptomatic individuals. Moreover, this data-driven approach to CVD is improving our understanding about how the molecular profile and a variety of psychosocial, lifestyle, dietary and demographic variables affects the genesis of the disease and its progression and, eventually, how and when SA will lead to cardiovascular events.

SORS: Found In Translation: a machine learning model for mouse-to-human inference

Date: 31 October 2019 11:00

Speakers: Rachelly Normand, Shen-Orr lab at Technion – Israel Institute of Technology

Venue: Aula de Teleensenyament (B3 Building, Campus Nord)

Abstract: Cross-species differences form barriers to translational research that ultimately hinder the success of clinical trials, yet knowledge of species differences has yet to be systematically incorporated in the interpretation of animal models. We developed a machine learning model that leverages human and mouse public gene expression data to extrapolate the results of a new mouse experiment to expression changes in the equivalent human condition. We applied FIT to data from mouse models of 28 different human diseases and show it is able to identify 20-50% more human-relevant differentially expressed genes. FIT predicted novel disease-associated genes, an example of which we validated experimentally in Crohn’s patients. FIT highlights signals that may otherwise be missed and reduces false leads with no experimental cost. It is available both as an R package and as a web tool.

SORS: Chromatin 3D organization revealed by chromatin networks: gene-regulation, replication, and beyond

Date: Wednesday, 10 July, 2019 – 11:00

Speakers: Vera Pancaldi, Epigenomics and network modelling of heterogeneity in immuno-oncology lab leader at CRCT

Venue:Aula de Teleensenyament, B3 Building

Abstract:Recent technological advances have allowed us to map chromatin conformation and uncover the spatial organization of the genome inside the nucleus. These experiments have revealed the complexities of genome folding, characterized by the presence of loops and domains at different scales which can change across development and cell types.

Different types of approaches have been employed to describe 3D genome organization, which can be broadly divided into polymer physics models, constraint based models and statistical approaches.

An increasingly popular representation of chromatin is given by networks, in which genomic fragments are the nodes and connections that represent experimentally observed spatial proximity of two genomically distant regions. This formalism, applied to promoter centered chromatin interaction networks generated by promoter capture HiC, has allowed us to consider a variety of chromatin features in association with the 3D structure, leading to novel biological insight on gene regulation (Pancaldi et al. Genome Biology 2016).

We thus propose network representation as the tool to bridge the different scales of chromatin organization and have developed an online chromatin network interaction viewer and an R package building on this framework.

In the present work, we characterize DNA replication in a 3D chromatin context, generating novel maps of replication origins in mouse embryonic stem cells under normal conditions and during DNA replication stress. These origins are then contextualized by projection on a promoter-centred chromatin contact network defined at a few kb resolutions. We find that replication origins with similar efficiency interact with each other preferentially, suggesting that DNA replication takes place in the context of hierarchical multi-scale structures spanning tens of megabases and even bridging chromosomes. More specifically, origins that interact with others tend to replicate earlier and with higher efficiency. The changes of origin activation patterns in normal and stressed conditions support a stochastic model of activation in which both local and global chromatin properties modulate efficiency.

SORS: Multiomics and Third Generation Sequencing, at the forefront of genomics research

Date: Tuesday, 11 June, 2019

Speakers: Ana Conesa, PhD, UF Preeminence Professor Bioinformatics, Genetics Institute and Microbiology and Cell Science Department, University of Florida, USA

Venue: Sala d’actes de la FiB, Campus Nord

Abstract: The development of new sequencing platforms and the combination of omics assays creates exciting opportunities for formulating and answering new scientific questions that were previously difficult to address. In this seminar I will present new methods and bioinformatics tools for the integration of multiomics data to infer multi-layered systems biology models, with application to the modeling of autoimmune disease progression. I will also present the Functional Iso-transcriptomics (FIT) framework (SQANTI, IsoAnnot and tappAS), that combines third-generation sequencing technologies with high-throughput positional function prediction and novel statistical methods to unravel the functional impact of the post-transcriptional regulation of gene expression.

SORS: Multidisciplinary Qualities of Systems Medicine. The pleasure of working with bioinformaticians, medical doctors and philosophers

Date: Tuesday, 30 Abril, 2019 – 11:00

Speakers:Astrid Lægreid, professor in Functional Genomics at the Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology, NTNU, Trondheim.

Venue: Sala d’actes de la FiB

Abstract: The biological sciences are producing impressive amounts of information about how cells function in normal and diseased states. It is now becoming possible to use computers to accurately model the behavior of cells and predict how they will respond to changes in the environment, or to drugs.
The NTNU project DrugLogics aims to develop and integrate computational, experimental and analytical approaches to predict and validate anti-cancer drug combinations and produce an integrated pipeline for rational screening of synergistic drugs and for clinical decision support in precision medicine.
Scientists from many different backgrounds work together, in order to develop the different technologies and approaches that need to be integrated in order to efficiently use what we know about specific cells in the design of computer models that can mimic cells’ and tumours’ behaviour and that can assist hospital doctors in selecting therapies tailored to individual patients. The project reflects on the developing ecosystem of publicly available knowledge and databases – the Knowledge Commons, for which systems medicine is a key visionary driver.

SORS: Network-guided integration of multi-omics data: towards a comprehensive view of cancer

Date: Tuesday, 09 April, 2019 – 11:00

Speakers: Laura Cantini, CNRS Research Scientist (Chargé de Recherches) at IBENS.

Venue:Sala d’actes de la FiB (Campus Nord)

Abstract: Cancer is a global health issue with a mortality rate that is expected to rise by about 70% over the next 2 decades (World Health Organization). Despite the significant breakthroughs in its understanding, prevention, and treatment, cancer’s complexity slows the quest for its cure. The advent of high-throughput technologies has provided the possibility to gather a comprehensive molecular picture of this disease by allowing the detailed characterization of thousands of tumors at multiple molecular levels (“multi-omics”). The current main challenge is to translate this wealth of information into actionable knowledge about the pathogenesis of this disease.

In this talk I will give three examples of how the integration of multiple omics data can reveal new insights about cancer. First, I will show that combining microRNA and mRNA expression data we can identify new regulatory mechanisms underlying colorectal cancer subtypes. Second, I will prove that multiplex networks, a trending topic in network theory, are very well suited for the joint integration of multi-omics data. Third, I will show that by combining the results of matrix factorization across 14 independent transcriptomic datasets we can reconstruct the landscape of those pathways involved in the different subtypes of colorectal cancer

SORS: Building from scratch: de novo gene birth

Date: Tuesday, 26 March, 2019 – 11:00

Speakers: Mar Albà, ICREA Researcher at the Research Program on Biomedical Informatics (GRIB) from Hospital del Mar Research Institute (IMIM) and Universitat Pompeu Fabra (UPF).

Venue: Sala d’actes de la FiB (Campus Nord)

Abstract: During evolution genes are continuously gained and lost, contributing to the adaptation of the organism to a changing environment. The best understood mechanism for the birth of new genes is the duplication and modification of already existing genes. However, in recent years, evidence has been gathered that some genes are born de novo from previously non-coding genomic sequences. The proteins encoded by de novo genes bear no resemblance to other proteins and thus represent radical innovations.

Over the past 10 years we have investigated the mechanisms underlying de novo gene birth, including the emergence of novel transcripts, the translation of small ORFs and the functions of recently originated proteins. Whereas some of these processes are now better understood, many mysteries remain. The talk will present a summary of the research in the area and the challenges ahead.

SORS: The Mutational Landscape of a Prion-like Domain

Date: Tuesday, 05 February, 2019 – 11:00

Speakers: Benedetta Bolognesi, Junior Group Leader, IBEC.

Venue: Sala d’actes de la FiB (Campus Nord)

Abstract: At least 70 human RNA-binding proteins contain a prion-like domain (PrLD). PrLDs are low complexity domains which resemble in composition the infectious yeast prions. Mutations in PrLDs are associated to the onset of many neurodegenerative conditions, such as Amyotrophic Lateral Sclerosis (ALS). PrLDs are able to populate multiple physical states: diffuse, liquid de-mixed, insoluble amyloid. Pathological mutations affect these equilibria in ways we cannot yet fully understand, or predict. The TAR DNA binding protein TDP-43 contains a 140 aa long PrLD and forms cytoplasmic aggregates in most cases of ALS. We use Deep Mutational Scanning to understand how sequence determines the toxicity of TDP-43 in a yeast model. I will present the first “genotype-to-phenotype” map of TDP-43 where we quantify the effect of all possible amino acid substitutions in the PrLD on cellular fitness. While allowing us to understand the impact of mutations within low-complexity regions, these data provide the basis to understand by which mechanism protein inclusions drive pathogenesis.


SORS: A novel MMR pathway in prokaryotes

Date: Tuesday, 23 October, 2018 – 11:00

Speakers: Ana Rojas, group leader of the Bioinformatics and Computational Biology Group at the Andalusian Center for Developmental Biology (CSIC).

Venue: Sala de Teleensenyament, Building B3, Campus Nord

Abstract: Mismatch repair pathway (MMR) is essential to maintain genome stability. While MutS and MutL are essential for performing the initial and steps of the route, those are missing in many Archaea, most Actinobacteria, and other prokaryotes. However, these organisms exhibit similar spontaneous mutation rates to those bearing the MMR proteins.

We have reported NucS, as an endonuclease involved in Mismatch repair (MMR) with no structural homology to known MMR factors. By genetic screenings we found [1] that this protein is required for mutation avoidance and anti-recombination, hallmarks of the canonical MMR in the surrogate model Mycobacterium smegmatis, lacking classical MutS-MutL factors. Furthermore, phenotypic analysis of naturally occurring polymorphic NucS in a M. smegmatis surrogate model, suggests the existence of M. tuberculosis mutator strains.

Structural bioinformatics coupled to evolutionary studies of NucS indicate a complex making-up of the pathway that involved at least two horizontal gene transfers leading to a disperse distribution pattern in prokaryotes. Together, these findings indicate that distinct pathways for MMR have evolved at least twice in nature. Strikingly, the absence of any MMR protein (MutS/L or NucS) on few microoganisms, indicate that additional pathways are yet to be found. The analyses of these findings in the evolutionary context of the classical MMR proteins open novel and intriguing questions in the emergence of the MMR systems.

SORS: Genomic analysis pipeline: overview, challenges, and proposed solutions

Date: Monday, 15 October, 2018 – 12:00

Speakers: Idoia Ochoa, Assistant Professor University of Illinois at Urbana-Champaign (UIUC) IL, USA.

Venue: Campus Nord, Building C6, Room E-106

Abstract: In this talk we will give an overview of the genomic analysis pipeline, from data generation to its analysis. In doing so, we will identify the main challenges arising in the genomic setting. These include dealing with errors introduced during the sequencing process, designing state-of-the-art specialized compressors to deal with the ever-growing amount of genomic data being generated, as well as improving the accuracy of the current tools used for the analysis.

We will emphasize some of the efforts being carried out by the international community to design a standard under the International Standardization Organization (ISO), denoted MPEG-G, for genomic information representation. We will also introduce a new filtering tool intended to improve the accuracy of variant calling, the last step of the genomic analysis pipeline whose output is generally the starting point for analysis in the personalized medicine paradigm. We will conclude the talk with some thoughts of where the community is going and the challenges that we will face in the near future.

SORS: Dissecting the Molecular Mechanisms of Complex Diseases Through a Pathway and Network Oriented Analysis of -omics Data

Date: Thursday, 06 September, 2018 – 11:00

Speakers: Burcu Bakır-Güngör, Ph.D.Assistant Professor Department of Computer Engineering
Faculty of Engineering Abdullah Gul University.

Venue: Campus Nord, Building C6, Room E-106

Abstract: The tremendous boost in the next generation sequencing technologies and in the “omics” technologies makes it possible to look for the coordinated behavior among different levels of biochemical activity. In contrast to isolated molecules, network and pathway oriented analyses are thought to better capture pathological perturbations and hence, better explain predisposition to disease. Especially in complex diseases, which are intrinsicly multifactorial, there are no strong associations for a single factor. In this regard, we have recently proposed a new methodology to analyze the -omics data in a network related context to identify pathways that are involved in disease development mechanisms. In this seminar, I will introduce our approach and talk on its applications on different Genome-wide Association Study (GWAS) datasets and –omics datasets. I will also present how this approach can help us to identify disease-associated pathway markers across different populations and discuss how these pathway markers can help us to understand individual disease development mechanisms in terms of the determination of individual targets for treatments, and hence bridging the gap between the -omics data and personalized medicine.

Briefly, PANOGA (Pathway and Network-Oriented GWAS Analysis) combines nominally significant evidence of genetic association with current knowledge of biochemical pathways, protein–protein interaction networks, and functional information of selected single nucleotide polymorphisms (SNP). With its multifactorial basis, we have shown on four complex diseases that PANOGA has a good potential to decipher the combination of biological processes underlying disease. Then via comparing GWASs of two different populations, we have shown that the few SNPs that are identified in GWAS and their associated genes are mostly targeting the same pathway combinations, and these biological pathways show higher conservation across populations. If the combination of these pathways does not function properly, a specific disease may develop.

Although PANOGA is originally developed to identify disease-associated pathways via further analyzing GWAS data, later it is shown to work well on different -omics datasets including transcriptomics, proteomics, and epigenomics studies. Using different –omics datasets, our group is currently working on the development of methodologies to extend this approach to individual level to identify specific modifications occurring on the genes within these identified pathways. Dissecting the individual disease development mechanisms will provide a valuable insight for discovering individualized therapy targets and will pave the way towards personalized medicine applications. This approach would enable biomedical researchers to identify affected pathways and function-altering factors within these pathways. For diagnostic purposes, the identification of the disease-related pathways is also instrumental in the determination of biomarkers at different levels (e.g., SNPs, gene expression levels, protein levels in serum, miRNA levels, metabolite concentration).

SORS: Mutations and Variations in Health and Disease: Protein Interaction Networks and 3D Structure Information

Date: Monday, 02 July, 2018 – 11:00

Speakers: Prof. Franca Fraternali, Randall Centre for Molecular and Cellular Biology, King’s College London, UK.

Venue: Sala d’actes de la FiB, (B6 Building) Campus Nord UPC

Abstract:In the last years Systems Biology has provided frameworks to integrate high-throughput biological and clinical data, providing significant insights into some of the fundamental roles of genes and proteins in maintaining a functional cellular state. However, it is still challenging to employ quantitative methods to identify important disease-related relationships between proteins harbouring mutations in their structural domains. In our approach we zoom in from a macroscopic view of PPI networks, and review how protein structural information can play a pivotal role in interpreting genetic variants in a PPI context. By mapping variants onto experimental structures or predicted models of protein complexes, one can offer a physico-chemical explanation of the functional impact of these variants; this may help to unravel the molecular basis of a particular disease. We then zoom out to look at how PPI data annotation and integration is essential to gain a deeper understanding of the effect of variants on PPIs communication and miscommunication. We conclude it is necessary to acquire a multidimensional view of the interaction network, in order to fully understand the role of genetic variants in health and disease.

We observe clear differences in the distribution of mutation types in different 3D-structure regions, with complementary patterns distinguishing between pathogenic and common variants, suggesting that these properties can be used as input for predictions tools. More generally, we show that 3D PPIN analysis can also help biologists to effectively search for possible targets for disease treatment.

SORS: Integrative data approaches towards a personalized prevention of cancer: The epidemiological vision

Date: Tuesday, 26 June, 2018 – 11:00

Speakers: Núria Malats, MD, MPH, PhD Genetic & Molecular Epidemiology Group Spanish National Cancer Research Centre (CNIO) Madrid, Spain.

Venue: Sala d’Actes, FIB Building (B6), Campus Nord, Barcelona

Abstract: Disease prevention can highly benefit of a personalized medicine approach through the accurate discrimination of individuals at high risk of developing a specific disease from those at moderate and low risk. To this end precise risk prediction models need to be built. This endeavour requires a precise characterization of the individual exposome, genome, and phenome. Massive molecular omics data representing the different layers of the biological processes of the host and the non-host will enable to build more accurate risk prediction models. Epidemiologists aim to integrate omics data along with important information coming from other sources (questionnaires, candidate markers) that has been proved to be relevant in the risk assessment of complex diseases.

The vast proportion of pancreatic cancer is named sporadic because it does not aggregate within families and its aetiology is complex. Both genetic and non-genetic factors have been associated with sporadic pancreatic cancer though the magnitude of their risk is small/moderate. Therefore, cost-efficient primary and secondary prevention programs for sporadic pancreatic cancer should be based on multifactorial integrative scores to define high-risk populations. Steps towards the integration of omics and non-omics factors selected through an appropriate methodology are ongoing using the PanGenEU study resources. However, the integrative models in large-scale epidemiologic research still face numerous challenges, some of them at the analytical stage. I will comment on the efforts we do to better characterize pancreatic cancer risk factors and the strategies we plan to apply to build integrative predictive risk scores.

SORS: Constraints and variability of complementarity determining regions in antibodies

Date: Thursday, 10 May, 2018 – 15:00

Speakers: Alba Lepore, postdoctoral researcher at the Biozentrum University of Basel & SIB Swiss Institute of Bioinformatics.

Venue: Aula de Teleensenyament

Abstract: Antibodies have been very extensively investigated for decades and therefore much is known about their sequence-structure-function relationship. The ability of accurately modeling the structure of antibodies stems from the recognition that the hypervariable loops only exhibit a limited number of main-chain conformations called “canonical structures”. Most sequence variations in five of these loops only modify the surface generated by the side chains on a canonical main chain structure. The third loop of the heavy chain (H3) has a different behavior, and has revealed to be very difficult to model given its high variability in both length and structure. We applied a machine-learning approach combining sequence and structural related features to identify candidate loops as templates to build the structure of a target H3 loop. Models are subsequently ranked based on a score reflecting the likelihood of the presence/absence of specific interactions between the H3 residues and its structural environment. The method has led to a significant improvement in the prediction of the H3 region and the overall antigen-binding site.

We next analyzed how differences between antigen-binding sites might be linked to their specificity. To this purpose, we developed a superposition free method for comparing the surfaces of antibody binding sites based on shape descriptors. We showed that similar antigen-binding sites could be better detected based on shape descriptors than using traditional structure similarity metrics. Finally, we showed that a classification procedure based on this approach could be applied to derive information about the recognized antigen, representing a step towards the very elusive goal of predicting antibody specificity.

SORS: Mining the Integrated Connectedness of Biomedical Systems

Date: Wednesday, 18 April 2018 – 11:00

Speakers: Natasa Przulj, Professor of Biomedical Data Science at University College London (UCL) Computer Science Department.

Venue: Aula de Teleensenyament (

Abstract: We are faced with a flood of molecular and clinical data. Various bio-molecules interact in a cell to perform biological function, forming large, complex systems. Large-scale patient-specific omics datasets are increasingly becoming available, providing heterogeneous, but complementary information about cells, tissues and diseases. The challenge is how to mine these interacting, complex,  complementary data systems to answer fundamental biological and medical questions.  Dealing with them is nontrivial, because many questions we ask to answer from them fall into the category of computationally intractable problems, necessitating the development of heuristic methods for finding approximate solutions.

We develop methods for extracting new biomedical knowledge from the wiring patterns of systems-level, heterogeneous, networked biomedical data.  Our methods link the patterns in molecular networks and the multi-scale network organization with biological function.  In this way, we translate the information hidden in the wiring patterns into domain-specific knowledge.  In addition, we introduce a versatile data fusion (integration) framework that can effectively integrate the information obtained from mining molecular networks with patient-specific somatic mutation data and drug chemical data to address key challenges in precision medicine: stratification of patients, prediction of driver genes in cancer, and re-purposing of approved drugs to particular patients and patient groups. Our new methods stem from novel network science approaches coupled with graph-regularized non-negative matrix tri-factorization, a machine learning technique for dimensionality reduction and co-clustering of heterogeneous datasets. We utilize our new framework to develop methodologies for performing other related tasks, including disease re-classification from modern, heterogeneous molecular level data, inferring new Gene Ontology relationships, and aligning multiple molecular networks.

SORS: 3D Lineage-Specific Genome Architecture Links Regulatory Elements and Non-coding Disease Variants to Target Gene Promoter

Date: Friday, 06 April, 2018 – 11:00

Speakers: Biola M. Javierre, PhD, Josep Carreras Leukaemia Research Institute.

Venue:C6 Building – Room E106

Abstract: Human cells bears in its nucleus about 2 meters of DNA containing the genes that shape the being and the manner in which is packed regulates its function. Long-range physical interactions between regulatory elements and gene promoters play key roles in transcriptional regulation. The vast majority of interactions are uncharted, constituting a major missing link in understanding genome control. Genome-wide association studies have identified thousands of single nucleotide polymorphisms (SNPs) associated with common disorders but most of them expand non-coding regions, being difficult to be interpreted. Interestingly these non-coding SNPs cluster on DNA hypersensitivity sites, hallmark of regulatory element, pointing out a potential role of these genetic variants in the deregulation of target genes. For all these reasons, a new technique called promoter capture Hi-C (PCHi-C) has been implemented. PCHi-C allows the genome-wide systematic identification of the interacting regions that are in physical contact with 31,253 human promoters. Applying this cutting-edge technology in 17 human primary hematopoietic cell types, it has been shown that promoter interactions are highly cell type specific and enriched for links between active promoters and epigenetically marked enhancers. Promoter interactomes reflect lineage relationships of the hematopoietic tree, consistent with dynamic remodeling of nuclear architecture during differentiation. Interacting regions are enriched in genetic variants linked with altered expression of genes they contact, highlighting their functional role. With this approach, non-coding disease variants have been connected to putative target promoters, prioritizing thousands of disease-candidate genes and implicating disease pathways. These results demonstrate the power of primary cell promoter interactomes to reveal insights into genomic regulatory mechanisms underlying common diseases.

SORS: Diagnosis of low burden tumors using circulating cell-free DNA

Date: Thursday, 05 April, 2018 – 11:00

Speakers: Milana Frenkel-Morgenstern, Principal Investigator and Senior Lecturer at Azrieli Faculty of Medicine, Bar-Ilan University & Data Science Institute and BSC Senior Researcher.

Venue:C6 Building – Room E106

Abstract: Gliomas are the most frequent brain tumors worldwide. Gliomas make up about 30% of all brain and central nervous system tumors, and 80% of all malignant brain tumors. Diagnosis of the glioma tumor type and its grade is a most essential step in order to suggest a right treatment for the glioma patients. We present a comprehensive study of the different types of the tumors with a low burden in plasma matched with the cfDNA extracted from a clinical cohort of patients’ plasma in order to find unique tumor mutations as biomarkers. We successfully detected the glioma specific mutations for the highly frequently mutated genes such as IDH2, PDGFRA, NOTCH1, PIK3R1 and 30 other genes. We identified the particular mutations of the cfDNA isolated from the plasma of the glioma patients, followed by the DNA-sequencing and our predictive bioinformatics analysis. We have collected the matched tumor and cfDNA mutations to uncover the tumor grade as well as its heterogeneity using our unique measurement of the mutations coverage by the DNA-seq reads. Moreover, we used our previously published methods to uncover unique fusions in the glioma patients and its alterations in the protein-protein interactions networks to understand the tumor prognosis. For the best of our knowledge, our study is the most advanced study in the field of the liquid biopsy for the brain cancer tumors, and it will provide a quick and safe non-invasive diagnostic method for the glioma patients, as it uncovers the tumour sub-types using unique biomarkers. This will provide the best personalized treatment for the highly complicate disease and will eventually bypass the existing “wait-and- see” method for prognosis.

SORS: Mining networks to study rare and common diseases

Date: Wednesday, 31 January, 2018 – 09:00

Venue: Aula de Teleensenyament

Abstract: Networks are scaling-up the analysis of gene and protein functions, hence offering new avenues to study the diseases in which these genes and proteins are involved. I will discuss the exploration of biological networks containing thousands of physical and functional interactions between proteins. In particular, we now focus on multiplex networks, i.e., networks composed of layers containing the same nodes but different interaction categories, such as protein-protein interaction, molecular complexes or co-expression. We have developed partitioning algorithms to recover communities – or functional modules – from these more complex and data-rich networks, and use them to study the cellular functions of genes and proteins of interest. Recently, we also adopted a random walk strategy to navigate multiplex networks, and extract information about genes and proteins implicated in rare genetic diseases associated with a premature aging phenotype. I will finally show ongoing work dedicated to the disease-contextualization of biological networks thanks to the integration of protein or RNA expression data.