
Center for Molecular and Biomolecular Informatics
The Center for Molecular and Biomolecular Informatics (CMBI) does research and education, and provides services in bioinformatics and cheminformatics.Mission & vision
Our mission
Our mission is to add value to personal health data by their translation into integrative knowledge and actionable information.
Our vision
- CMBI develops bioinformatics approaches that contribute to the understanding of disease mechanisms, personalized therapies and interventions, and a learning health care system
- CMBI is committed to the reusability of their data, tools, and services
- CMBI provides bioinformatics researchers in Radboudumc with a platform for exchange of knowledge and expertise
- CMBI contributes to the education of BMW, MLW, and MMD students so that they can apply and understand the principles behind (existing) bioinformatics tools

Radboudumc Technology Center Bioinformatics
The Bioinformatics technology center aims to raise and solve biological questions using the most recent computer technologies and big data solutions.
read moreEducation
Researchers at the CMBI contribute to several courses at both the Faculty of Science and the Medical Faculty. We provide courses in Structural Bioinformatics, Comparative Genomics, Data Analysis, and programming courses such as Java. We specifically focus on the Molecular Life Science students, Biomedical Sciences students and participants in the master Molecular Mechanisms of Disease, although some of our courses can be chosen by Biology, Chemistry or Medical students as well.
Our mission is to provide a basic understanding in Bioinformatic principles for bachelor students as this was shown to be beneficial for those who want to pursue a career in Life Sciences. Follow-up courses are available for those who want to gain greater insight in our field.
For MLS/Biology students it is even possible to follow the B-track by choosing a combination of (master) Bioinformatics courses and internships at Bioinformatics departments.
Our researchers also teach in special interest courses and summer schools here at the Radboudumc, the Radboud University and elsewhere.
For more information, you can contact dr. Hanka Venselaar, education coordinator at the CMBI.
Internships
Both bachelor and master students from studies such as Molecular Life Sciences, Biomedical Sciences, Chemistry and MMD are welcome. In general, we are flexible in terms of internship length, type of internship and type of research.
Below you can find our internship projects:
-
Description
The human protein usherin, encoded by the USH2A gene, is involved in hearing and vision as a member of the USH2 complex. This complex forms the stereociliary ankle link complex at the apical surface of the sensory cells of the inner ear, the cochlear hair cells, and is required for the proper formation of the developing hair bundles. In retina photoreceptors, the USH2 complex is present at the periciliary membrane and is thought to play a role in regulating intracellular protein transport.
Mutations in USH2A either lead to Usher syndrome, a both genetically and clinically heterogeneous condition characterized by progressive vision loss as a consequence of retinitis pigmentosa combined with sensorineural congenital hearing impairment, or to non-syndromic retinitis pigmentosa. The hearing impairment can be partially compensated by fitting of hearing aids or cochlear implants. Although, for the loss of vision currently no treatment options exist.
dr. Erwin van Wijk's group at the department of Otorhinolaryngology focuses on unraveling the pathogenic mechanisms underlying Usher syndrome and developing genetic therapies for this condition. One of their successful approaches consists of an RNA-based antisense oligonucleotide (AON) therapy which induces the in frame skipping of a mutated exon during the process of pre-mRNA splicing. The 3D-structure of the resulting protein indicates that a well-chosen exon-skip can result in a functional, albeit slightly shortened protein. The first AON for USH2A-associated disease is currently being evaluated in a phase 1/2 clinical trial.
In this internship, you will be studying and identifying the requirements for a successful exon-skipping approach. You will take into account the available information on, amongst others, protein type, domain content, 3D-structure, exon-boundaries, splice modulating factors, evolutionary conservation, and protein-protein interactions. The goal is to generate a bio-informatic pipeline which enables the prediction of successful targets for the development of exon-skipping therapies.
Supervisors: Erwin van Wijk (Otorhinolaryngology) / Hanka Venselaar (CMBI)
Proposed duration: 5 month or more
Preferred background: BMW-master (course BMS39), MLS (courses MOL066)
Requirements: Experience with 3D-visualisation software, genomic databases -
Description
Liquid chromatography / mass spectrometry-based metabolomics data is highly prone to batch-to-batch variability. These include shifts in retention time and variations in signal intensity. Additionally, trends within batches can often be observed. Quality control (QC) samples can be used to normalize intensities across batches. At the translational metabolic laboratory (TML), different types of QC samples are included in the next generation metabolic screening (NGMS) pipeline which is used for clinical diagnostics of inborn errors of metabolism. Performing a correction of batch effects is required if measurements from different batches are to be combined. The combination of multiple batches would allow for larger study designs and potentially improve both metabolomics research as well as diagnostics. An initial comparison of basic batch effect correction approaches has been applied to a data set comprising multiple batches of NGMS measurements. The goal of this internship project is the application of various methods and tools for batch effect correction and an assessment of their performance using quality criteria.
Supervisor: Anna Niehues (CMBI/TML)
Co-supervisor: Purva Kulkarni (TML)
Proposed duration: 5-6 months
Requirements: Basic knowledge of R is advantageous
Topics: bioinformatics, mass spectrometry, metabolomics, batch effect correction, normalization, programming
Further reading
Coene, K.L.M., Kluijtmans, L.A.J., van der Heeft, Ed et al. Next-generation metabolic screening: targeted and untargeted metabolomics for the diagnosis of inborn errors of metabolism in individual patients. J Inherit Metab Dis 41, 3, 337-353 (2018). https://doi.org/10.1007/s10545-017-0131-6
De Livera, A.M., Sysi-Aho, M., Jacob, L. et al. Statistical Methods for Handling Unwanted Variation in Metabolomics Data. Anal Chem 87, 7, 3606–3615 (2015). https://doi.org/10.1021/ac502439y
Li, B., Tang, J., Yang, Q. et al. Performance Evaluation and Online Realization of Data-driven Normalization Methods Used in LC/MS based Untargeted Metabolomics Analysis. Sci Rep 6, 38881 (2016). https://doi.org/10.1038/srep38881
Wehrens, R., Hageman, J.A., van Eeuwijk, F. et al. Improved batch correction in untargeted MS-based metabolomics. Metabolomics 12, 88 (2016). https://doi.org/10.1007/s11306-016-1015-8
-
Description:
Alternative splicing plays a major role in proteomic diversity (Liu et al., 2017). Long-read transcriptomics technology has allowed for the detection of many novel transcript structures, among which intron retention and exon skipping. These events have been found extensively in various forms of cancer, for example (Dvinge & Bradley, 2015). However, intron retention events are widespread and occur in normal biological settings to regulate gene expression (Braunschweig et al., n.d.; Wong, Au, Ritchie, & Rasko, 2016). We would like to know whether we can detect these events on the protein level and what possible (effect of) IR/ES occurs in healthy cells, using the well-studied cell line NA12878.
Outline steps:
- Literature review.
- Use data available for NA12878 and/or other long read datasets with paired proteomics data.
- Identify intron retention and exon skipping events on the transcriptome level.
- Create customized search dictionary to be able to identify these events.
- Quantify events/identify patterns or associations with other genes, and interpret biological/medical significance.
- Write up.
Requirements:
- Master in computational biology, bioinformatics or similar.
- Basic programming skills (python preferred), knowledge of Linux command line.
- Solid statistical knowledge.
- Prior experience handling large datasets preferred.
- 6 months or longer available for internship.
Reference:
- Braunschweig, U., Barbosa-Morais, N. L., Pan, Q., Nachman, E. N., Alipanahi, B., Gonatopoulos-Pournatzis, T., … Blencowe, B. J. (n.d.). Widespread intron retention in mammals functionally tunes transcriptomes. https://doi.org/10.1101/gr.177790.114
- Dvinge, H., & Bradley, R. K. (2015). Widespread intron retention diversifies most cancer transcriptomes. Genome Medicine, 7(1), 1–13. https://doi.org/10.1186/s13073-015-0168-9
- Liu, Y., Gonzàlez-Porta, M., Santos, S., Brazma, A., Marioni, J. C., Aebersold, R., … Wickramasinghe, V. O. (2017). Impact of Alternative Splicing on the Human Proteome. Cell Reports, 20(5), 1229–1241. https://doi.org/10.1016/j.celrep.2017.07.025
- Wong, J. J.-L., Au, A. Y. M., Ritchie, W., & Rasko, J. E. J. (2016). Intron retention in mRNA: No longer nonsense. BioEssays, 38(1), 41–49. https://doi.org/10.1002/bies.201500117
Internship supervisor: Prof. dr. Peter-Bram ‘t Hoen
Daily supervisor: Renee Salz, PhD student (Renee.Salz@radboudumc.nl )
Starting date: September/October 2020 -
Project description:
Koolen-de Vries Syndrome (KdVS) is an intellectual disability syndrome caused by haploinsufficiency of KANSL1. The KANSL1 protein is part of the non-specific lethal (NSL) complex that is involved in regulation of gene expression. How exactly this affects the brain remains unknown. Also, there is no treatment available for KdVS. To study KdVS in vitro, we generated induced pluripotent stem cells (iPSCs) from fibroblasts from KdVS patients and healthy controls. From these iPSCs we generated neurons (iNeurons) and investigated transcriptional changes in these iNeurons using RNA-seq. The transcriptional profiles of KdVS iNeurons will be used to perform a computational screen to identify drug compounds that are predicted to rescue the KdVS phenotype. A large consortium, as part of the LINCS L1000 project, generated transcriptional profiles following drug perturbations of ~20.000 compounds [1]. This large database will be used to compare the KdVS transcriptional profiles against, in order to identify drug compounds that anticorrelate with the gene expression changes observed in KdVS iNeurons as they are predicted to rescue the KdVS phenotype. Additionally, other approaches to identify relevant compounds will be explored, such as the use of DrugBank and ChMEBL to identify compounds that target genes affected in KdVS.
The internship project will consist of setting up the computational pipeline to select for drugs of interest based on the transcriptional profiles in KdVS iNeurons. Data analysis will include hypothesis-driven drug identification analyses using correlations, overrepresentation analysis or other approaches. Basic knowledge of R programming and/or python is preferred. This internship project is part of a larger project in which the drugs selected using this pipeline will be validated, to check for their ability to rescue the KdVS phenotype in iNeurons on a functional level.References
- Subramanian, A., et al., A Next Generation Connectivity Map: L1000 Platform and the First 1,000,000 Profiles. Cell, 2017. 171(6): p. 1437-1452.e17.
Internship supervisor: Prof. dr. Peter-Bram ‘t Hoen
Daily supervisor: Anouk Verboven, PhD student (anouk.verboven@radboudumc.nl)
Starting date: September/October 2020 -
Plasmodium genomes are being sequenced at a high rate (e.g.https://www.malariagen.net/projects/pf3k), and data can be exploited to better understand the sequence variation. Specific question that we want to answer are: Which proteins do show the highest level of variation and is this variation linked to whether they are e.g. immunogenic, expressed in specific stages in the Plasmodium developmental stage, are exposed to the outside of the cell or are encoded on specific regions of the chromosome. The project combines the gathering of relevant data (e.g. about immunogenicity, or about developmental stage dependent expression of the protein), the analysis of the relationships between those data and the critical thinking about causal relationships versus correlations.
Programming skills: An ambition to work with large datasets is required and basic programming skills (R, python) are strongly preferred.
Preferred background: MMD student project, or MLW
Anticipated start: end 2019/beginning 2020
Contact: Martijn.Huijnen@radboudumc.nl -
The metagenomes of bacterial species living in and on humans have uncovered a new angle to predicting and understanding human health and disease.
Single molecule Molecular Inversion Probes (smMIPS) allow us to derive the composition of a metagenome at unprecendented phylogenetic depth at affordable costs, facilitating large scale analyses. Design of smMIPS for large numbers of genomes, using conserved sequences for the probes, and analysis of the results requires understanding of the experimental techniques that are being used, programming skills and biological knowledge to interpret the results: typical skills of a bioinformatician. In the project the student would first analyze the data that we have already obtained about human metagenomes, using in-house developed software, and would then go on to design new smMIPS to increase the number of species and strains that we can obtain.
Programming skills: An ambition to work with large sequence datasets is required and basic programming skills (e.g. python) are strongly preferred.
Anticipated start: summer 2019
Duration: 0.5 year, or more
Contact: Martijn.Huijnen@radboudumc.nl -
Description
Analysis of gene expression in bulk tissues using RNA sequencing provides important information to understand the processes involved in disease development and progression. However, many tissues are heterogeneous and contain a mix of different cell types. This makes it difficult to identify gene expression levels specific for various cell type, since the expression level of each individual gene can differ between cell types. Disease development and progression or the response to drug therapy is often accompanied by changes in cell composition. For example, an increase of malignant cells and tumor infiltrating cells compared with surrounding cells is an indicator of tumor growth and of the clinical outcome for patients. Thus, the cellular composition of a tissue is important for the understanding of many biological processes.
Rheumatoid arthritis (RA) is a disease characterized by inflammation of multiple joints, which, if not treated, leads to irreversible joint damage. During joint inflammation cells from the immune system infiltrate in the synovial membrane, leading to the formation of so-called pannus tissue. Synovial biopsies taken from the inflamed joints of RA patients are mainly composed of synovial fibroblasts, macrophages, neutrophils, and lymphocytes. Using publicly available RNAseq datasets, as well as single-cell sequencing data, you will elucidate the constituent cell types and their proportions in synovial biopsies from RA patients by computational deconvolution of gene expression data. Deconvolution is the identification of properties and the relative abundance of components from a mixture. This project will be carried out at the Department of Biomolecular Chemistry in close collaboration with the Centre for Molecular and Biomolecular Informatics with supervision from both departments.
MSc student project
Preferred background: Bioinformatics, Computational Biology
Programming skills: work with large datasets, knowledge of R or Python is required
Anticipated start: ASAP
Length: 0.5-1 year
Contact: M.Dunaeva@ncmls.ru.nl; G.Pruijn@ncmls.ru.nl; Peter-Bram.tHoen@radboudumc.nl -
Background
The world around us is teeming with microscopic life. The natural habitat of these so-called microbiota, for example in the human intestines or on the skin, is called the microbiome. These bacterial consortia are currently analyzed mostly by sequencing-based approaches, either by marker gene sequencing (MGS) metataxonomics, or by whole-genome sequencing (WGS) shotgun metagenomics. The advantage of WGS over MGS is that it provides insight into gene and metabolic function potential of a microbiome because all free DNA in a sample is sequenced, whereas MGS focuses on one gene only (usually the 16S rRNA gene).
Challenge
MGS is relatively cheap and data analysis is straight-forward, and it is still the standard in the field. In contrast, WGS is relatively expensive and very computational intensive. Nevertheless, more and more WGS datasets are being generated, as WGS is slowly becoming more affordable. At the CMBI, we currently lack a dedicated pipeline for analysis of metagenomics data. In order for us to be prepared for future projects we want to invest now in setting-up a WGS analysis pipeline.
Project description
A plethora of tools is currently available for analysis of WGS data. With these existing tools the student will have to build an analysis pipeline for metagenomics data. This means that the student will first have to study in-depth the different options available. Thereafter, a combination of tools will be used to build a comprehensive pipeline. Note that there is no need for the students to write bioinformatics tools or algorithms themselves. Depending on preferences and needs, this pipeline can in time be expanded with other tools and functionalities. Therefore, documentation (e.g. GitHub), portability (e.g. Docker or Jupyter) and structured programming is important in this project. Also, from a QC point-of-view the monitoring and detailed reporting of all pipeline steps is crucial. In final phase of the project, the student will validate his/her pipeline with available metagenomics datasets of different human microbiome niches, as reported in high-quality literature studies. In conclusion, we offer a lot of diversity and challenges for a highly motivated and enthusiastic student, in a flexible, supportive and professional academic environment.
Preferred background: Bioinformatics (BSc: HLO bioinformatics)
Basic programming skills are required (Python, Bash, R).
Anticipated start: summer / fall 2019
Length: at least 5-6 months, can always be prolonged
Contact: Tom.Ederveen@radboudumc.nl; Daniel.Garza@radboudumc.nl -
Background
The world around us is teeming with microscopic life. The natural habitat of these so-called microbiota, for example in the human intestines or on the skin, is called the microbiome. These bacterial consortia are currently analyzed mostly by sequencing-based approaches, either by marker gene sequencing (MGS) metataxonomics, or by whole-genome sequencing (WGS) shotgun metagenomics. Recently, a novel molecular / sequencing method called smMIP was developed for profiling of genes with a very high sensitivity. Within the Radboudumc, this smMIP method, for single-molecule molecular inversion probe, is currently being adapted for in-depth and high resolution profiling of microbiota.
Challenge
Recent studies show that smMIP is highly sensitive method for profiling of genes or RNA transcripts. However, its application for studying microbiomes has not been explored yet. Multiple departments within the Radboudumc are currently working together on application and validation of smMIP for microbiota profiling (i.e. dept. of Biochemistry, CMBI Bioinformatics and Dermatology). Pilot sequencing data has been generated, but several other important validations are yet to be performed, one of these is an extensive in silico validation.
Project description
The student will be working with a panel of smMIPs that is designed to target the vaginal microbiome, which is developed in our institute. Using existing smMIP analysis software we want to perform an in silico validation on a published metagenome data set of vaginome samples, to see if theoretical smMIP profiles match those microbiota profiles as found by metagenome analysis of the same data. As metagenomics is not trivial, the student will be working closely together with another student who will be fully dedicated on building an analysis pipeline for metagenomics data. Furthermore, the smMIP software is not build for the here intended purpose of in silico profiling of metagenome sequencing reads, therefore asking creativity from the student in adapting the smMIP bioinformatics tool to our specific needs. In parallel, another pilot study will be performed in which 16S vaginome profiles are compared to those retrieved by the smMIP approach. In conclusion, we offer a lot of diversity and challenges for a highly motivated and enthusiastic student, in a flexible, supportive and professional academic environment.
Preferred background: Bioinformatics (MSc: MLS/BMS/BIO/MMD)
Basic programming skills are required (Python, Bash, R).
Anticipated start: summer / fall 2019
Length: at least 5-6 months, can always be prolonged
Contact: Tom.Ederveen@radboudumc.nl; William.Leenders@radboudumc.nl; Karolina.Andralojc@radboudumc.nl -
Description
HOPE is our own in-house server for the prediction of mutational effects. This server is specifically aimed at the medical scientist and produces a report that is clear and understandable for everyone without a background in structural bioinformatics. HOPE has been running for several years and is being used by researchers all over the world. In order to keep this server up to date we perform small tests in which we select mutations that were described in high-end journals such as The American Journal of Human Genetics and Nature Genetics. We compare the effect predictions made in these article with those made by our HOPE server (and eventually also with predictions made by other widely-used automatic online servers). The results should be used to improve the predictions made by HOPE.
Preferred background: MLS/BMS, evt BIO
Length: flexible, between 1-6 months.
Programming skills: not necessary
Contact: Hanka.Venselaar@radboudumc.nl -
Upon the cleavage of proteins into short peptides, some interact with MHC molecules and are delivered to the cell surface via the MHC molecule to be presented to the immune cells. If a foreign peptide is presented by the MHC molecule the adaptive immune response will be triggered. The main determinant step of this antigen presentation is the binding of the peptide and the MHC molecule. Deciphering the mechanism underlying peptide:MHC interactions is valuable in designing vaccines for infectious diseases, cancer immunotherapy and understanding the cause to autoimmune diseases. Three-dimensional (3D) structures of peptide:MHC complexes provide fundamental insights of interaction specificities, binding affinities and sensitivity to mutations.
Complementary to experimental methods, modelling provides a valuable tool for generating 3D models of peptide:MHC complexes.Existing computational research methods are mostly effective for MHC class I but not for MHC class II. The modelling of peptide:MHCII is crucial and there are few experiments on this type of interaction compared to peptide:MHCI interactions. MHC class II molecules interact with exogenous peptides and show different type of interaction, mainly due to MHC II molecule’s intrinsic different binding groove. With its open peptide-binding cleft it tends to interact with longer peptides and also it has more and different anchor pockets.
We have developed a framework for the modelling of peptide:MHC class I complexes in our lab so far. The internship project will consist of setting up the computational framework to do 3D modelling of peptide:MHCII complexes which would allow us to produce large quantities of structures, that can be used for further analysis, crucial for vaccines development. Following to the modelling of peptide:MHCII complexes, there is room to extend the project in scoring the models to select high-quality models to spot the specifity of immune responses
The student will have the opportunities to:
- Learn to perform homology modelling
- Learn the basics of biophysics used for loop modelling
- Learn Git and GitHub
- Contribute to extend our modelling framework to peptide:MHC class II complexes
Preferred background: Bioinformatics (MSc: MLS /BIO/MMD)
Preferred Basic knowledge in structural biology or immunology
Basic programming skills are required (Python)
Anticipated start: fall/winter 2020
Length: at least 5-6 monthsInternship supervisor: Assis. Prof. dr. Li Xue
Daily supervisor: Farzaneh MeimandiParizi, PhD studentContact: f.meimandiparizi@radboudumc.nl, Li.Xue@radboudumc.nl
-
Description
Proteins can hardly ever function on their own. Many proteins are found in homo- and heteromeric complexes, and many others are found in a transient complex at least once during their lifecycle. Interactions made by these proteins can be disturbed by mutations, which often lead a non-functional protein complex and eventually a disease. Therefore, it would be interesting to predict the effects of mutations that occur on protein surfaces. However, information about interacting residues on surfaces is still scarce and only a few protein protein interaction servers exist.
In collaboration with a research group in Amsterdam, we have linked their PPI-prediction server SERENDIP to a script that visualizes the predictions in a YASARA scene. A possible student project would be to test and analyse these predictions.
Preferred background: MLS/BMS/BIO/MMD
Length: at least 2 months
Programming skills: basics
Contact: Hanka.Venselaar@radboudumc.nl -
In order to correctly treat patients, it is beneficial to understand the molecular base of their disease or syndrome. Mutations that occur in the coding sequence of proteins might have effect on ligand binding, membrane anchoring, general stability/folding, interactions with other proteins, etc. Nowadays, several online servers that can analyse these effects exist. One of them is our own HOPE server, a server that can analyse the protein structure in detail and will provide an extensive report readable for the medical scientist. Other predictions servers often provide a result that contains only a binair answer (damaging or not), or a value (0 to 10). HOPE's results could be strengthened by combining its textual report with a consensus prediction that would be a combination of these other servers.
Preferred background: master MLS/Informatics/Data science
Length: 0.5-1 year
Programming skills: Good
Contact: Hanka.Venselaar@radboudumc.nl -
Our aim is to obtain a deeper understanding of the expression of the genes in the DM1 locus (Fig. 1), the DMPK and DM1-AS transcripts in particular, during brain development, their regional expression patterns in the brain and their expression levels in different cell types present in the brain. These expression patterns can then subsequently be related to the brain abnormalities observed in DM1. In analogy to published work on DMD3, you will be using public large-scale gene expression resources from human and mice, including the Allen Brain Atlas, the FANTOM5 study and the 10x genomics mouse brain single cell dataset. You will be analyzing, comparing and interpreting the expression signatures present in these different resources.
Programming skills: An ambition to work with large datasets is required and basic programming skills (R, python) are strongly preferred.
Preferred background: MMD student project
Anticipated start: September 2020
Contact: Peter-Bram.tHoen@radboudumc.nl
Bioinformatics Services
We maintain computational facilities, databases, and software packages in bioinformatics.
read moreOrganisation
Research groups CMBI
Discover some of our research that is related to molecular and biomolecular informatics.
read moreResearch themes
Affiliated institutes

Radboud Institute for Health Sciences
This department is affiliated with RIHS. The research at this institute aims to improve clinical practice and public health. institute pages