Bioinformatics Webinars

The next webinar is on the 6th of November at 2 pm

Speaker: Dewei Hu, from University of Copenhagen

SPACE: STRING proteins as complementary embeddings

Dewei Hu

SPACE: STRING proteins as complementary embeddings

Representation learning has revolutionized sequence-based prediction of protein function and subcellular localization. Protein networks are an important source of information complementary to sequences, but the use of protein networks has proven to be challenging in the context of machine learning, especially in a cross-species setting. To address this, we leveraged the STRING database of protein networks and orthology relations for 1,322 eukaryotes to generate network-based cross-species protein embeddings. We did this by first creating species-specific network embeddings and subsequently aligning them based on orthology relations to facilitate direct cross-species comparisons. We show that these aligned network embeddings ensure consistency across species without sacrificing quality compared to species-specific network embeddings. We also show that the aligned network embeddings are complementary to sequence embedding techniques, despite the use of seqeuence-based orthology relations in the alignment process. Finally, we demonstrate the utility and quality of the embeddings by using them for two well-established tasks: subcellular localization prediction and protein function prediction. Training logistic regression classifiers on aligned network embeddings and sequence embeddings improved the accuracy over using sequence alone, reaching performance numbers close to state-of-the-art deep-learning methods. A set of precomputed cross-species network embeddings and ProtT5 embeddings for all eukaryotic proteins have been included in STRING version 12.0.

Upcoming Webinars:

January 8th 2026: Dennis Voelkl, University of Bergen. Topic: HIDE: hierarchical cell-type deconvolution

Past webinars

Arber Qoku from German Cancer Research Center (DKFZ)

MOFA-FLEX: A Factor Model Language for Integrating Omics Data with Prior Knowledge

Abstract:Despite scientific advances, options for treating cancer patients are still limited, especially after relapse. To achieve more sustainable treatment efficacy, patients will need novel drug targets and combination therapies. However, the lack of understanding on cancer-drug interactions prevents biomarker discovery and ultimately leads to high attrition in clinical trials. In this talk, I will introduce multiple systems medicine approaches for the prediction of drug targets, the prediction of drug sensitivity, and the prediction of drug combinations. I will explore: 1) whether the loss-of-function genetic screen technologies such as CRISPR or RNAi can help identify cancer dependency map; 2) whether the integration of loss-of-function genetic and drug sensitivity screening data could help identify the mechanisms of action of drugs; and 3) whether the drug target interaction and drug sensitivity data can help predict synergistic drug combinations. Finally, I will show the potential of lineage tracing data to inform an evidence-based treatment decision.

Alexandru Tomescu from the University of Helsinki

Decomposing a weighted graph into few paths for solving multi-assembly problems

Abstract: A multi-assembly problem usually asks to decompose a weighted graph into a small number of weighted paths. For example, in the RNA transcript assembly problem, the graph nodes are gene exons, the weights are read coverages, and the edges indicate connections between exons observed in the RNA-seq reads. The weighted paths are then the RNA transcripts (together with their expression levels), that need to be identified from the RNA-seq reads. Such problems easily become NP-hard, and thus many RNA assembly tools solve them heuristically. In this talk I will give an overview of our recent results on extremely fast and exact solvers for a range of such decomposition problems. These are based on Mixed Integer Linear Programming, and benefit from additional “safety optimizations” based on insight into the “simple” parts of the graph structure. These results are currently being incorporated into a Python package, so that developers of future multi-assembly tools can easily use exact and accurate decomposition methods.

Dr. Jing Tang from the University of Helsinki

Advanced computational and experimental methods to tackle drug resistance in cancer

Abstract:Despite scientific advances, options for treating cancer patients are still limited, especially after relapse. To achieve more sustainable treatment efficacy, patients will need novel drug targets and combination therapies. However, the lack of understanding on cancer-drug interactions prevents biomarker discovery and ultimately leads to high attrition in clinical trials. In this talk, I will introduce multiple systems medicine approaches for the prediction of drug targets, the prediction of drug sensitivity, and the prediction of drug combinations. I will explore: 1) whether the loss-of-function genetic screen technologies such as CRISPR or RNAi can help identify cancer dependency map; 2) whether the integration of loss-of-function genetic and drug sensitivity screening data could help identify the mechanisms of action of drugs; and 3) whether the drug target interaction and drug sensitivity data can help predict synergistic drug combinations. Finally, I will show the potential of lineage tracing data to inform an evidence-based treatment decision.

Chengbo Fu from Aalto University

KMAP: Kmer Manifold Approximation and Projection for visualizing DNA sequences

Abstract: Abstract: Identifying and illustrating patterns in DNA sequences is a crucial task in various biological data analyses. In this task, patterns are often represented by sets of kmers, the fundamental building blocks of DNA sequences. To visually unveil these patterns, we could project each kmer onto a point in two-dimensional (2D) space. However, this projection poses challenges due to the high-dimensional nature of kmers and their unique mathematical properties. Here, we established a mathematical system to address the peculiarities of the kmer manifold. Leveraging this kmer manifold theory, we developed a statistical method named KMAP for detecting kmer patterns and visualizing them in 2D space. We applied KMAP to three distinct datasets to showcase its utility. KMAP achieved a comparable performance to the classical method MEME, with approximately 90% similarity in motif discovery from HT-SELEX data. In the analysis of H3K27ac ChIP-seq data from Ewing Sarcoma (EWS), we found that BACH1, OTX2 and ERG1 might affect EWS prognosis by binding to promoter and enhancer regions across the genome. We also found that FLI1 bound to the enhancer regions after ETV6 degradation, which showed the competitive binding between ETV6 and FLI1. Moreover, KMAP identified four prevalent patterns in gene editing data of the AAVS1 locus, aligning with findings reported in the literature. These applications underscore that KMAP could be a valuable tool across various biological contexts.

Thaddeus Wu: James Cook University, Australia

Behind the Transcriptomic Complexity

Abstract: Recent advances in single-cell long-read sequencing have revolutionised our understanding of transcriptomic complexity. However, traditional analysis pipelines, designed for short-read sequencing, fail to capture the full complexity of isoform-level information. In this talk, I will introduce our ongoing project – a novel computational framework – that addresses these challenges. Using early blood development as a model system, we demonstrate the interesting findings and insights to understand the transcriptomic complexity

Luca Musella: University Hospital of Erlangen at Julio Vera’s lab, Germany

ENQUIRE reconstructs and expands context-specific co-occurrence networks from biomedical literature

Abstract: The accelerating growth of scientific literature overwhelms our capacity to manually distill complex phenomena like molecular networks linked to diseases. Moreover, biases in biomedical research and database annotation limit our interpretation of facts and generation of hypotheses. ENQUIRE (Expanding Networks by Querying Unexpectedly Inter-Related Entities) offers a time- and resource-efficient alternative to manual literature curation and database mining. ENQUIRE reconstructs and expands co-occurrence networks of genes and biomedical ontologies from user-selected input corpora and network-inferred PubMed queries. The integration of text mining, automatic querying, and network-based statistics mitigating literature biases makes ENQUIRE unique in its broad-scope applications. For example, ENQUIRE can generate co-occurrence gene networks that reflect high-confidence, functional networks. When tested on case studies spanning cancer, cell differentiation, and immunity, ENQUIRE identified interlinked genes and enriched pathways unique to each topic, thereby preserving their underlying diversity. ENQUIRE supports biomedical researchers by easing literature annotation, boosting hypothesis formulation, and facilitating the identification of molecular targets for subsequent experimentation.