Center for Molecular and Biomolecular InformaticsThe Center for Molecular and Biomolecular Informatics (CMBI) does research and education, and provides services in bioinformatics and cheminformatics.
Mission & vision
Our mission is to add value to personal health data by their translation into integrative knowledge and actionable information.
- CMBI develops bioinformatics approaches that contribute to the understanding of disease mechanisms, personalized therapies and interventions, and a learning health care system
- CMBI is committed to the reusability of their data, tools, and services
- CMBI provides bioinformatics researchers in Radboudumc with a platform for exchange of knowledge and expertise
- CMBI contributes to the education of BMW, MLW, and MMD students so that they can apply and understand the principles behind (existing) bioinformatics tools
Researchers at the CMBI contribute to several courses at both the Faculty of Science and the Medical Faculty. We provide courses in Structural Bioinformatics, Comparative Genomics, Data Analysis, and programming courses such as Java. We specifically focus on the Molecular Life Science students, Biomedical Sciences students and participants in the master Molecular Mechanisms of Disease, although some of our courses can be chosen by Biology, Chemistry or Medical students as well.
Our mission is to provide a basic understanding in Bioinformatic principles for bachelor students as this was shown to be beneficial for those who want to pursue a career in Life Sciences. Follow-up courses are available for those who want to gain greater insight in our field.
For MLS/Biology students it is even possible to follow the B-track by choosing a combination of (master) Bioinformatics courses and internships at Bioinformatics departments.
Our researchers also teach in special interest courses and summer schools here at the Radboudumc, the Radboud University and elsewhere.
For more information, you can contact dr. Hanka Venselaar, education coordinator at the CMBI.
Both bachelor and master students from studies such as Molecular Life Sciences, Biomedical Sciences, Chemistry and MMD are welcome. In general, we are flexible in terms of internship length, type of internship and type of research.
Below you can find our internship projects:
Rationale and objective:
A significant proportion of right-sided colonic precursor lesions of colorectal cancer (CRC) are still missed during colonoscopy because right-sided precursor lesions are often flat and non-ulcerated (sessile-related lesions: SSLs). The high miss rate of SSLs in combination with their distinct and rapid progression to highly aggressive CRC requires significant improvements in detection. Almost all adenomas and carcinomas (89%) of the right-side colon have solid bacterial biofilms. We postulate that these biofilms might be a good surrogate marker for precursor lesions and could be used to detect adenomas and early cancers. In this project we want to identify extracellular microbial target peptides specific for right-sided precursor lesion using a discovery-based metagenomics and proteomics approach.
Metagenomics: DNA of archived tissue material of right-sided SSLs (n=20) and CAAs (n=20) collected between 2010-2018 has been extracted and sequenced and will be compared with metagenomics sequencing of normal right-sided biopsies of control population from the BaCo-study of patients without neoplastic lesions (n=35). Proteomics: Adult patients that are scheduled for an EMR-procedure (n=20) for removal of colonic right-sided precursor lesions >10 mm in size, will be asked to participate. Proteomics will be performed on surface-shaved samples of precursor lesions and corresponding normal biopsies.
Next-generation library preparation and sequencing will be performed by Novogene using the Illumina Novaseq technology and generating an average of 4 million paired-end reads with average size of 150 bases. Data will be combined with the sequencing data that we have generated for the metagenomes of biofilm positive/negative normal tissues of 35 healthy controls following the same protocol (pipeline established by Daniel Garza). First, short sequencing reads will be filtered to remove tags, low-quality reads, and human DNA. Remaining reads will be assembled into longer contigs and these contigs will be binned into metagenomic assembled genomes (MAGs). The quality and completeness of these genomes will be assessed by the distribution of single copy ubiquitous genes using the CheckM method. The open reading frames (ORFs) of the assembled MAGs and high quality contigs that were not binned into a MAG will be used to generate a database of metagenome-predicted proteins. The proteins in this database will be extensively annotated by mapping them to a reference database consisting of 10 million protein sequences of microorganisms (bacteria, archaea, viruses, protists, and fungi) that have been previously found in the human gut by metagenomic studies across the world.
Proteomics: The intact precursor lesions will be exposed to tryptic digestion to cleave all available protein targets on the outside of the lesion. Peptides will be analyzed combining liquid chromatography with tandem mass spectrometry at the Radboud Technology Center for Mass Spectrometry, and identified using metagenomic sequence libraries of biofilm positive cases. Bioinformatic analysis of the candidate marker proteins will be performed to identify potential targets that can be stained and visualized during endoscopy. True membrane bound peptides will be selected based on annotated protein domains.
Protein identification specific for precursor lesions: All identified proteins will be annotated by their taxonomic origin, biological function, protein domains, and cellular localization. A specific protocol named Inmembrane for cellular localization will be used for protein sequences of bacterial origin. This protocol is based on integrating different tools and using specific references for gram-negative and gram-positive bacteria to predict cellular localization, identifying outer membrane proteins, cell-wall proteins, extracellular proteins and proteins that are known to be shed and attached to the cell surface. A similar search strategy will be used for non-bacterial proteins, using Pfam.
The predicted abundance of the identified microbial proteins will be used for target discovery. For this purpose we will first compare proteins individually, by modeling the data as a negative binomial distribution, using the DESeq protocol. Next, we will perform variable selection using Elastic Net (EN) and partial least squares discriminant analysis PLS-DA coupled with variable importance selection. Both models will be applied with a 10-fold cross-validation to identify consistent combinations of proteins that best explain the difference between normal/biofilm-negative and precursor lesions/biofilm-positive tissues. Specificity will be tested by comparing the predictive value of our selected proteins on other metagenomes from the same biological material that we have previously generated, including biofilm positive normal tissues of healthy controls, Lynch syndrome and Inflammatory Bowel Disease samples. Finally, we will preselect proteins following three criteria: (i) significant FDR-corrected p-values from the DESeq analysis; (ii) high consistency scores found by the variable selection procedures (EN and PLS-DA) and; (iii) proteins that exhibit high specificity to biofilm-positive precursor lesions compared to healthy tissue of Lynch syndrome and inflammatory bowel disease patients. From these preselected proteins, we will further select a group of proteins that are highly expressed on the precursor lesions/biofilm-positive tissues and are predicted to have cell-wall, secretion, membrane, or extracellular localization domains. These proteins will be selected as potential targets for the visualization of biofilms during colonoscopy.
- processing metagenomics and proteomics sequencing reads/profiles
- annotation of protein targets
- selection of membranous protein targets
- differential expression between precursor lesions and controls
Bioinformatics (MSc: MLS/BMS/BIO/MMD)
Basic programming skills are required (Python, Bash, R)
Length: at least 5-6 months, can always be prolonged
Deeper understanding of the immune system’s intricacies has led to clinical breakthroughs of personalized cancer vaccines in eliminating tumors in advanced-stage cancer patients1-4. Formulated with fragments from a patient’s tumor DNA, cancer vaccines train a patient’s own immune system to recognize a patient’s mutated cancer proteins as ‘foreign’ and wage a lethal attack against tumors (see key concepts in Box 1). The major puzzle in this field is: which of a patient’s hundreds of tumor mutations can trigger the immune system to attack tumors? Complementary to costly and time-consuming wet-lab screenings (e.g., Sipuleucel-T was priced at $93,0005), predictive algorithms that can quickly pinpoint neoantigens from a patient’s tumor DNA are urgently needed, if personalized cancer vaccines are to be applied on a large scale.
Box 1. The TCR:peptide:MHC complex, neoantigens, and their pivotal role in the immune surveillance system and T-cell-mediated immune attacks on tumor cells6. Cells constantly break down proteins into peptides. The major histocompatibility complex (MHC) proteins present some of these peptides on the cell surface. T cells are fired up when their T-cell receptor (TCR) recognizes tumor-specific peptides presented on the tumor cell surface by MHC proteins forming the TCR:peptide:MHC (TCR:pMHC) complex1,2 (Figure 1). MHC class I presents on the surface of every cell, while MHC II only presents on specific immune cells, e.g., dendritic cells. Tumor peptides presented by MHC-I can activate CD8+ T cells, which can directly kill tumor cells that present the peptides on their surface. Peptides presented by MHC-II can activate CD4+ T cells, which stimulate the production of antibodies and can provide help to CD8+ T cells. Such tumor-mutation derived peptides that are recognized by T cells as 'foreign' (i.e., immunogenic) are called neoantigens7. MHC epitopes: MHC-binding peptides. TCR epitopes: peptides that bind both MHC and TCR.
Figure 1. TCR nomenclature and the TCR:pMHC complex. A TCR has two chains (
Our overall aim is to improve the efficacy, safety and development time of existing T cell based cancer vaccine approaches.
You will learn:
- Advanced 3D modelling techniques for protein-protein complexes
- 3D convolutional networks (ConvNets) on protein structures
- Basic knowledge of T cell based immunotherapy
- You will have opportunity to learning deep learning on these elegant molecules by joining our lab meetings and discussions with our collaborators - world-class deep learning experts.
- Present your work
- Summarize your work in manuscript (and potential opportunities to be included in our publications)
- Contribute to our integrative modelling software PANDORA for building 3D models for peptide-MHC
- Generate 3D models for peptide-MHC complexes using our PANDORA software
- Design and train a 3D ConvNet in DeepRank (our deep learning framework for data mining protein complexes, accepted for Nature Communications)10 on these 3D models to predict vaccine candidates
- Basic structural biology knowledge is needed
- Good programming skills (ideally python) are needed
- Basic deep learning knowledge is needed
Time to start: As early as possible
Time last: 6-12 months
Contact: Li Xue: Li.Xue@radboudumc.nl
- Ott, P. A. et al. An immunogenic personal neoantigen vaccine for patients with melanoma. Nature 547, 217–221 (2017).
- Sahin, U. et al. Personalized RNA mutanome vaccines mobilize poly-specific therapeutic immunity against cancer. Nature 547, 222–226 (2017).
- Harjes, U. Personal training by vaccination. Nature Reviews Cancer 17, 451
- Sahin, U. & Türeci, Ö. Personalized vaccines for cancer immunotherapy. Science 359, 1355–1360 (2018).
- Jaroslawski, S., Drugs, T. S.-T. B.2015. Autopsy of an innovative paradigm change in cancer treatment: Why a single-product Biotech company failed to capitalize on its breakthrough Invention.
- Neefjes, J., Jongsma, M. L. M., Paul, P. & Bakke, O. Towards a systems understanding of MHC class I and MHC class II antigen presentation. Nature Reviews Immunology 2018 18:7 11, 823–836 (2011).
- Schumacher, T. N. & Schreiber, R. D. Neoantigens in cancer immunotherapy. Science 348, 69–74 (2015).
- Leem, J., de Oliveira, S. H. P., Krawczyk, K. & Deane, C. M. STCRDab: the structural T-cell receptor database. Nucleic Acids Res. 46, D406–D412 (2018).
- La Gruta, N. L., Gras, S., Daley, S. R., Thomas, P. G. & Rossjohn, J. Understanding the drivers of MHC restriction of T cell receptors. Nature Reviews Immunology 2018 18:7 18, 467–478 (2018).
- Renaud, N. et al. DeepRank: A deep learning framework for data mining 3D protein-protein interfaces. bioRxiv 2021.01.29.425727 (2021). doi:10.1101/2021.01.29.425727
For information please visit: https://www.ai-for-health.nl/vacancies/md_drug_repurposing/
Plethora of cellular processes are known for their dependence on membrane shape dynamics, typically brought about by membrane-protein interactions. Several such interactions often include special motifs, amphipathic in character, that fold into helices upon contact with the lipid bilayer membrane. These amphipathic helices, defined by the regularly interspersed hydrophobic and polar faces, participate in diverse cellular roles (such as intracellular vesicular transport, endocytosis, protection from stress) that are fine-tuned based on their sequence composition and length.
Considering the wealth of knowledge on membrane-sculpting amphipathic helices in eukaryotic, viral, and bacterial protein repertoire, it is worth noting that computational methods to characterize amphipathic helical regions are scarce.
Scope of the project
The student will participate in design and development of a computational tool to predict amphipathic helices in a protein using guided and multipronged sequence-based approach. In particular, the focus will be on amino acid compositions that reflect on modulation of membrane curvature.
The student will also attempt to study molecular and mechanistic details of amphipathic helix-membrane interactions, through modelling. For select cases, molecular dynamics simulations will be used to model such interactions and study changes in membrane curvature.
With this project, the student will not only have a chance to gain experience in sequence analyses and membrane biophysics, but also to contribute towards understanding fundamentals of membrane biology.
- Masters in Bioinformatics
- Basic programming skills- python, bash
Duration: 6 months
Internship supervisor: Prof. Martijn Huijnen
Daily supervisor: Dr. Gayatri Ramakrishnan
- Giménez-Andrés, M., Čopič, A., and Antonny, B. (2018). The Many Faces of Amphipathic Helices. Biomolecules, 8(3), 45. https://doi.org/10.3390/biom8030045
Gulsevin A, Meiler J (2021) Prediction of amphipathic helix—membrane interactions with Rosetta. PLOS Computational Biology 17(3): e1008818. https://doi.org/10.1371/journal.pcbi.1008818
- How preserved is the splicing between Weri-Rb cells and PPCs compared to the retina?
- Optional: validation of splicing events with RT-PCR
Inherited retinal diseases (IRDs) are the most common cause of blindness in adults. IRDs are often caused by variants that alter splicing. To identify splice altering variants in IRD patients, splice assays are used. At the moment, splice assays are performed in human embryonic kidney (HEK293T) cells. The problem is that splicing is a tissue specific process. To correctly identify effects of possibly splice altering variants on the retina we need a better model to test the splice assays. Possible options to replace the HEK293T cells with more retina like cells are Weri-Rb cells and photoreceptor precursor cells (PPCs). Weri-Rb cells are human retinoblastoma cells. Photoreceptor precursor cells are post-mitotic precursor cells which can differentiate into cones or rods.
New long read sequencing techniques have brought new insights into transcriptomics. By comparing pacbio long read RNA sequencing data for retina, Weri-Rb cells and PPCs we want to identify splicing events that are preserved. If the splicing of the cells is similar to the splicing in the retina, the cells can be used to replace HEK293T cells in the study of splice assays. After identifying splicing event in silico, it is possible to validate the findings in the lab with RT-PCR.
- Sangermano R, Khan M, Cornelis SS, Richelle V, Albert S, Garanto A, Elmelik D, Qamar R, Lugtenberg D, van den Born LI, Collin RWJ, Cremers FPM. ABCA4 midigenes reveal the full splice spectrum of all reported noncanonical splice site variants in Stargardt disease. Genome Res. 2018 Jan;28(1):100-110. doi: 10.1101/gr.226621.117. Epub 2017 Nov 21. PMID: 29162642; PMCID: PMC5749174.
- Albert S, Garanto A, Sangermano R, Khan M, Bax NM, Hoyng CB, Zernant J, Lee W, Allikmets R, Collin RWJ, Cremers FPM. Identification and Rescue of Splice Defects Caused by Two Neighboring Deep-Intronic ABCA4 Mutations Underlying Stargardt Disease. Am J Hum Genet. 2018 Apr 5;102(4):517-527. doi: 10.1016/j.ajhg.2018.02.008. Epub 2018 Mar 8. PMID: 29526278; PMCID: PMC5985352.
Supervisors: Tabea Riepe / Peter-Bram ‘t Hoen / Frans Cremers
Proposed duration: 5 month or more
Preferred background: MSc: MLS/BMS/BIO/MMD
Requirements: basic programming skills (python preferred), command line
Peter-Bram ‘t Hoen and colleagues
The international rare disease research consortium (IRDiRC) set an ambitious goal for 2027: the development of 1000 novel therapies for patients with rare diseases, the majority of which for diseases without any current treatment options. Drug repurposing, the reuse of approved drugs for new indications, is a key element in this strategy, because the clinical development trajectory is usually much shorter and the expenses for their development are lower. In this project, you will work on novel strategies to prioritize candidate drugs based on molecular -omics (RNA-seq, proteomics, metabolomics) data and knowledge on the similarity of symptoms. You will focus on diseases for which we have collected a lot of data, e.g. myotonic dystrophy and Koolen-de Vries syndrome. You will develop innovative approaches, for example using network inference methods and deep learning.
-You will create multimodal knowledge networks that combine information on disease symptoms, and molecular signatures of diseases and drugs and existing indications for these drugs.
-You will utilize these knowledge networks for semisupervised learning and predicting drug repurposing candidates. You will explore the use deep learning frameworks like graph convolutional networks.
-Studying for a Master degree in Bioinformatics, Computer Sciences, Artificial Intelligence, Molecular Life Sciences or similar
-Student should be available for at least 5 months
-Basic R or python programming experience
-An internship in a environment with a strong focus on computational method development and many collaborations with biomedical and clinical scientists
-Experience of being embedded in a research group with related activities such as journal clubs and social events
-Weekly and ad-hoc meetings with supervisors
-Ample opportunities to present your work in internal and external meetings
The human protein usherin, encoded by the USH2A gene, is involved in hearing and vision as a member of the USH2 complex. This complex forms the stereociliary ankle link complex at the apical surface of the sensory cells of the inner ear, the cochlear hair cells, and is required for the proper formation of the developing hair bundles. In retina photoreceptors, the USH2 complex is present at the periciliary membrane and is thought to play a role in regulating intracellular protein transport.
Mutations in USH2A either lead to Usher syndrome, a both genetically and clinically heterogeneous condition characterized by progressive vision loss as a consequence of retinitis pigmentosa combined with sensorineural congenital hearing impairment, or to non-syndromic retinitis pigmentosa. The hearing impairment can be partially compensated by fitting of hearing aids or cochlear implants. Although, for the loss of vision currently no treatment options exist.
dr. Erwin van Wijk's group at the department of Otorhinolaryngology focuses on unraveling the pathogenic mechanisms underlying Usher syndrome and developing genetic therapies for this condition. One of their successful approaches consists of an RNA-based antisense oligonucleotide (AON) therapy which induces the in frame skipping of a mutated exon during the process of pre-mRNA splicing. The 3D-structure of the resulting protein indicates that a well-chosen exon-skip can result in a functional, albeit slightly shortened protein. The first AON for USH2A-associated disease is currently being evaluated in a phase 1/2 clinical trial.
In this internship, you will be studying and identifying the requirements for a successful exon-skipping approach. You will take into account the available information on, amongst others, protein type, domain content, 3D-structure, exon-boundaries, splice modulating factors, evolutionary conservation, and protein-protein interactions. The goal is to generate a bio-informatic pipeline which enables the prediction of successful targets for the development of exon-skipping therapies.
Supervisors: Erwin van Wijk (Otorhinolaryngology) / Hanka Venselaar (CMBI)
Proposed duration: 5 month or more
Preferred background: BMW-master (course BMS39), MLS (courses MOL066)
Requirements: Experience with 3D-visualisation software, genomic databases
Plasmodium genomes are being sequenced at a high rate (e.g.https://www.malariagen.net/projects/pf3k), and data can be exploited to better understand the sequence variation. Specific question that we want to answer are: Which proteins do show the highest level of variation and is this variation linked to whether they are e.g. immunogenic, expressed in specific stages in the Plasmodium developmental stage, are exposed to the outside of the cell or are encoded on specific regions of the chromosome. The project combines the gathering of relevant data (e.g. about immunogenicity, or about developmental stage dependent expression of the protein), the analysis of the relationships between those data and the critical thinking about causal relationships versus correlations.
Programming skills: An ambition to work with large datasets is required and basic programming skills (R, python) are strongly preferred.
Preferred background: MMD student project, or MLW
The metagenomes of bacterial species living in and on humans have uncovered a new angle to predicting and understanding human health and disease.
Single molecule Molecular Inversion Probes (smMIPS) allow us to derive the composition of a metagenome at unprecendented phylogenetic depth at affordable costs, facilitating large scale analyses. Design of smMIPS for large numbers of genomes, using conserved sequences for the probes, and analysis of the results requires understanding of the experimental techniques that are being used, programming skills and biological knowledge to interpret the results: typical skills of a bioinformatician. In the project the student would first analyze the data that we have already obtained about human metagenomes, using in-house developed software, and would then go on to design new smMIPS to increase the number of species and strains that we can obtain.
Programming skills: An ambition to work with large sequence datasets is required and basic programming skills (e.g. python) are strongly preferred.
Duration: 0.5 year, or more
The world around us is teeming with microscopic life. The natural habitat of these so-called microbiota, for example in the human intestines or on the skin, is called the microbiome. These bacterial consortia are currently analyzed mostly by sequencing-based approaches, either by marker gene sequencing (MGS) metataxonomics, or by whole-genome sequencing (WGS) shotgun metagenomics. The advantage of WGS over MGS is that it provides insight into gene and metabolic function potential of a microbiome because all free DNA in a sample is sequenced, whereas MGS focuses on one gene only (usually the 16S rRNA gene).
MGS is relatively cheap and data analysis is straight-forward, and it is still the standard in the field. In contrast, WGS is relatively expensive and very computational intensive. Nevertheless, more and more WGS datasets are being generated, as WGS is slowly becoming more affordable. At the CMBI, we currently lack a dedicated pipeline for analysis of metagenomics data. In order for us to be prepared for future projects we want to invest now in setting-up a WGS analysis pipeline.
A plethora of tools is currently available for analysis of WGS data. With these existing tools the student will have to build an analysis pipeline for metagenomics data. This means that the student will first have to study in-depth the different options available. Thereafter, a combination of tools will be used to build a comprehensive pipeline. Note that there is no need for the students to write bioinformatics tools or algorithms themselves. Depending on preferences and needs, this pipeline can in time be expanded with other tools and functionalities. Therefore, documentation (e.g. GitHub), portability (e.g. Docker or Jupyter) and structured programming is important in this project. Also, from a QC point-of-view the monitoring and detailed reporting of all pipeline steps is crucial. In final phase of the project, the student will validate his/her pipeline with available metagenomics datasets of different human microbiome niches, as reported in high-quality literature studies. In conclusion, we offer a lot of diversity and challenges for a highly motivated and enthusiastic student, in a flexible, supportive and professional academic environment.
Preferred background: Bioinformatics (BSc: HLO bioinformatics)
Basic programming skills are required (Python, Bash, R).
Length: at least 5-6 months, can always be prolonged
Contact: Tom.Ederveen@radboudumc.nl; Daniel.Garza@radboudumc.nl
The world around us is teeming with microscopic life. The natural habitat of these so-called microbiota, for example in the human intestines or on the skin, is called the microbiome. These bacterial consortia are currently analyzed mostly by sequencing-based approaches, either by marker gene sequencing (MGS) metataxonomics, or by whole-genome sequencing (WGS) shotgun metagenomics. Recently, a novel molecular / sequencing method called smMIP was developed for profiling of genes with a very high sensitivity. Within the Radboudumc, this smMIP method, for single-molecule molecular inversion probe, is currently being adapted for in-depth and high resolution profiling of microbiota.
Recent studies show that smMIP is highly sensitive method for profiling of genes or RNA transcripts. However, its application for studying microbiomes has not been explored yet. Multiple departments within the Radboudumc are currently working together on application and validation of smMIP for microbiota profiling (i.e. dept. of Biochemistry, CMBI Bioinformatics and Dermatology). Pilot sequencing data has been generated, but several other important validations are yet to be performed, one of these is an extensive in silico validation.
The student will be working with a panel of smMIPs that is designed to target the vaginal microbiome, which is developed in our institute. Using existing smMIP analysis software we want to perform an in silico validation on a published metagenome data set of vaginome samples, to see if theoretical smMIP profiles match those microbiota profiles as found by metagenome analysis of the same data. As metagenomics is not trivial, the student will be working closely together with another student who will be fully dedicated on building an analysis pipeline for metagenomics data. Furthermore, the smMIP software is not build for the here intended purpose of in silico profiling of metagenome sequencing reads, therefore asking creativity from the student in adapting the smMIP bioinformatics tool to our specific needs. In parallel, another pilot study will be performed in which 16S vaginome profiles are compared to those retrieved by the smMIP approach. In conclusion, we offer a lot of diversity and challenges for a highly motivated and enthusiastic student, in a flexible, supportive and professional academic environment.
Preferred background: Bioinformatics (MSc: MLS/BMS/BIO/MMD)
Basic programming skills are required (Python, Bash, R).
Length: at least 5-6 months, can always be prolonged
Contact: Tom.Ederveen@radboudumc.nl; William.Leenders@radboudumc.nl; Karolina.Andralojc@radboudumc.nl
HOPE is our own in-house server for the prediction of mutational effects. This server is specifically aimed at the medical scientist and produces a report that is clear and understandable for everyone without a background in structural bioinformatics. HOPE has been running for several years and is being used by researchers all over the world. In order to keep this server up to date we perform small tests in which we select mutations that were described in high-end journals such as The American Journal of Human Genetics and Nature Genetics. We compare the effect predictions made in these article with those made by our HOPE server (and eventually also with predictions made by other widely-used automatic online servers). The results should be used to improve the predictions made by HOPE.
Preferred background: MLS/BMS, evt BIO
Length: flexible, between 1-6 months.
Programming skills: not necessary
Upon the cleavage of proteins into short peptides, some interact with MHC molecules and are delivered to the cell surface via the MHC molecule to be presented to the immune cells. If a foreign peptide is presented by the MHC molecule the adaptive immune response will be triggered. The main determinant step of this antigen presentation is the binding of the peptide and the MHC molecule. Deciphering the mechanism underlying peptide:MHC interactions is valuable in designing vaccines for infectious diseases, cancer immunotherapy and understanding the cause to autoimmune diseases. Three-dimensional (3D) structures of peptide:MHC complexes provide fundamental insights of interaction specificities, binding affinities and sensitivity to mutations.
Complementary to experimental methods, modelling provides a valuable tool for generating 3D models of peptide:MHC complexes.
Existing computational research methods are mostly effective for MHC class I but not for MHC class II. The modelling of peptide:MHCII is crucial and there are few experiments on this type of interaction compared to peptide:MHCI interactions. MHC class II molecules interact with exogenous peptides and show different type of interaction, mainly due to MHC II molecule’s intrinsic different binding groove. With its open peptide-binding cleft it tends to interact with longer peptides and also it has more and different anchor pockets.
We have developed a framework for the modelling of peptide:MHC class I complexes in our lab so far. The internship project will consist of setting up the computational framework to do 3D modelling of peptide:MHCII complexes which would allow us to produce large quantities of structures, that can be used for further analysis, crucial for vaccines development. Following to the modelling of peptide:MHCII complexes, there is room to extend the project in scoring the models to select high-quality models to spot the specifity of immune responses
The student will have the opportunities to:
- Learn to perform homology modelling
- Learn the basics of biophysics used for loop modelling
- Learn Git and GitHub
- Contribute to extend our modelling framework to peptide:MHC class II complexes
Preferred background: Bioinformatics (MSc: MLS /BIO/MMD)
Preferred Basic knowledge in structural biology or immunology
Basic programming skills are required (Python)
Anticipated start: fall/winter 2020
Length: at least 5-6 months
Internship supervisor: Assis. Prof. dr. Li Xue
Daily supervisor: Farzaneh MeimandiParizi, PhD student
Proteins can hardly ever function on their own. Many proteins are found in homo- and heteromeric complexes, and many others are found in a transient complex at least once during their lifecycle. Interactions made by these proteins can be disturbed by mutations, which often lead a non-functional protein complex and eventually a disease. Therefore, it would be interesting to predict the effects of mutations that occur on protein surfaces. However, information about interacting residues on surfaces is still scarce and only a few protein protein interaction servers exist.
In collaboration with a research group in Amsterdam, we have linked their PPI-prediction server SERENDIP to a script that visualizes the predictions in a YASARA scene. A possible student project would be to test and analyse these predictions.
Preferred background: MLS/BMS/BIO/MMD
Length: at least 2 months
Programming skills: basics
In order to correctly treat patients, it is beneficial to understand the molecular base of their disease or syndrome. Mutations that occur in the coding sequence of proteins might have effect on ligand binding, membrane anchoring, general stability/folding, interactions with other proteins, etc. Nowadays, several online servers that can analyse these effects exist. One of them is our own HOPE server, a server that can analyse the protein structure in detail and will provide an extensive report readable for the medical scientist. Other predictions servers often provide a result that contains only a binair answer (damaging or not), or a value (0 to 10). HOPE's results could be strengthened by combining its textual report with a consensus prediction that would be a combination of these other servers.
Preferred background: master MLS/Informatics/Data science
Length: 0.5-1 year
Programming skills: Good
Our aim is to obtain a deeper understanding of the expression of the genes in the DM1 locus (Fig. 1), the DMPK and DM1-AS transcripts in particular, during brain development, their regional expression patterns in the brain and their expression levels in different cell types present in the brain. These expression patterns can then subsequently be related to the brain abnormalities observed in DM1. In analogy to published work on DMD3, you will be using public large-scale gene expression resources from human and mice, including the Allen Brain Atlas, the FANTOM5 study and the 10x genomics mouse brain single cell dataset. You will be analyzing, comparing and interpreting the expression signatures present in these different resources.
Programming skills: An ambition to work with large datasets is required and basic programming skills (R, python) are strongly preferred.
Preferred background: MMD student project
Anticipated start: September 2021
Diagnostics-in-3D progress update 2021
About 7,000 rare hereditary diseases affect ±8% of the EU population, which translates to ±36 million people. Identification of causative mutations of such diseases thus, forms an essential step towards diagnosis as well as towards development of treatment. At CMBI, along with a multidisciplinary team of experts from BioProdict and Vartion, we are making efforts to predict (and eventually explain) functional effects of variants of unknown clinical significance.
Our EFRO funded project Diagnostics-in-3D uses an advanced deep learning framework known as DeepRank where we leverage information on protein structural features surrounding missense variants, coupled with the evolutionary significance of variant positions, and allow neural networks to learn from such variant environments. This exercise produces a probability estimate classifying whether a variant is disease-causing or not.
We have tailored DeepRank's 3D-CNN framework to help address our problem statement. One of the key aspects of this project is the diversity in variant environments captured from protein structures that differ depending on various protein families they belong to. Taking this aspect into account, we are now in the process of performance evaluation of our tool.