As of 1 January 2015 e-Biobanking is merged with INFINITI (Information retrieval for information services) and Data2Semantics (From Data to Semantics for Scientific Data Publishers).
An eBiobank stores a large number of biological samples digitally for the purpose of (clinical) research and life science research in general. Because of modern technology, the experimental scientific biological data-volumes are becoming increasingly large and the size of eBiobanks has increased significantly. It has been is to manage the process of scientific discovery that is increasingly reliant on the analysis of large volumes of highly complex data with a large degree of heterogeneity. Contrary to regular databases, the challenge lies not only in the number of records in the database but also in the complex combination of different data types (e.g. images, measurements, survey data and sensor data) from disparate sources. This information explosion gives rise to the need for better analysis and interpretation of data in order to make accurate predictions from it.
The inspiration for the eBioBanks project comes from three main applications:
- the detection of biomarkers in human tissue images;
- the generation of knowledge from large human cohort studies, e.g. emotions from movies, heart disease, genome alterations, ageing processes, and:
- bridging the gap between medical users and advanced ICT resources, e.g. user front-ends for biomedical research and biomedical experiments on distributed infrastructures.
The ICT challenge is here to develop the theoretical principles needed to scale inference and learning algorithms to a massive scale. This challenge originates from a positive effect of large data volumes as Big Data amplifies the inferential power of algorithms. On the negative side, massive data may amplify the noise that is inherent in any inferential algorithm (e.g. false positives, relationships that seem causal but that are in fact more coincidental) up to the point that it obscures the structures of interest. These downsides thus come with challenges in the area of data management (How to efficiently and securely manage the vast amounts of data? How to handle high dimensional, small sample size data? How to exploit the heterogeneity of the data in the analysis?), challenges in discovery (How to robustly learn cause-effect relations from complex data?) and with challenges in accessibility (How to make the data analysis tools useable for the scientists?)
Advanced computer methods combining statistics with optimization are an essential tool to interpret these data. We view building the necessary tools to support the process of scientific exploration as the core target of our research in the COMMIT/project.
Biggest results so far
Mutalyzer (video at the end of this page)
We have developed a software tool, called the Variant Description Extractor, that rapidly compares one human genome with another in order to find small but crucial genetic differences. Our tool generates a complete description for the human genome in about four hours. More.
ICT science question: the main scientific challenge is twofold. First, how to calculate short and unique descriptions from long strings of the letters that compose the genes? The genes can hold thousand to many millions of these four letters. Second, how can this calculation be done within an acceptable and minimal amount of computational time?
Involved COMMIT/partners: LUMC
Rapidly finding variations between human genomes
We have developed a software tool, called the Variant Description Extractor, that rapidly compares one human genome with another in order to find small but crucial genetic differences. Our tool generates a complete description for the human genome in about four hours. The human genome contains twenty to twenty-five thousand genes distributed over a long molecule, called DNA. Genes can be described by long strings of the four letters A, C, G and T. Each of them stands for a simpler molecule in the DNA. On average, humans only differ 0,1% genetically from each other. However, especially for finding causes and solutions to diseases it is crucial to find and understand these small differences. More
ICT science question: the main scientific challenge is twofold. First, how to calculate short and unique descriptions from long strings of the letters that compose the genes? The genes can hold thousand to many millions of these four letters. Second, how can this calculation be done within an acceptable and minimal amount of computational time?
Involved COMMIT/partners: LUMC
Web-based tools for handling biomedical Big Data (video at the end of this page)
Biomedical research is facing Big Data challenges. At present however, researchers don’t have user-friendly IT tools to handle these data. To solve this problem, Science Gateways are developed. Science Gateways are built as easy-to-use, web-based and scalable tools that manage and integrate data, methods and infrastructure for scientific research. Better, faster, cheaper biomedical research. Our Science Gateways enable researchers to handle their biomedical Big Data and harness the power of Big Computers without bothering about the IT-complexity inside. More.
ICT science question: how can Science Gateways deal with the perplexing amount and variety of system components? How can Science Gateways deal with the fact that the requirements from scientists are unknown or changing? Our research adopts principles of design science for information systems: we interactively build science gateways, validate them in the field and generate methodologies and best practices for the construction of future gateways. Our approach is unique because our gateways are designed for, evaluated by, and adopted by researchers in daily practice. Furthermore, our methodology and technology enables fast construction of new gateways across scientific domains.
Involved COMMIT/partners: AMC, Universiteit van Amsterdam, Sci-Bus
Molecular biobanks unravel the secrets of breast cancer
Winning NGI Venture Challenge for this breast cancer project
Visualization of molecular data for personalized, predictive, participatory and preventive medicine. Modern medicine tries to understand diseases more and more by looking at the molecular fingerprint of a disease. This is done by molecular imaging of biological tissues. Molecular imaging can assist in the diagnosis and prognosis of diseases. It also enables the development of medicine specific for you and not just for everybody, a booming research field called ‘personalized medicine’. Our demo presents a 3D-visualization of molecular imaging data generated by Mass Spectrometry Imaging (MSI). MSI is a technique for the simultaneous detection and visualization of a large variety of molecules based on their molecular masses. Using our interactive tool you can view and explore 3D-images of molecular breast cancer tissue. More.
ICT science question: the main scientific challenge is to reduce, process, analyze and interpret huge datasets.
Involved COMMIT/partners: Amolf, NKI-Antonie van Leeuwenhoed, PS-Tech