You are here

e-BIOBANKING (e-Biobanking with Imaging for Healthcare)

 

As of 1 January 2015 e-Biobanking is merged with INFINITI (Information retrieval for information services) and Data2Semantics (From Data to Semantics for Scientific Data Publishers).

An eBiobank stores a large number of biological samples digitally for the purpose of (clinical) research and life science research in general. Because of modern technology, the experimental scientific biological data-volumes are becoming increasingly large and the size of eBiobanks has increased significantly. It has been is to manage the process of scientific discovery that is increasingly reliant on the analysis of large volumes of highly complex data with a large degree of heterogeneity. Contrary to regular databases, the challenge lies not only in the number of records in the database but also in the complex combination of different data types (e.g. images, measurements, survey data and sensor data) from disparate sources. This information explosion gives rise to the need for better analysis and interpretation of data in order to make accurate predictions from it.

The inspiration for the eBioBanks project comes from three main applications:

  • the detection of biomarkers in human tissue images;
  • the generation of knowledge from large human cohort studies, e.g. emotions from movies, heart disease, genome alterations, ageing processes, and:
  • bridging the gap between medical users and advanced ICT resources, e.g. user front-ends for biomedical research and biomedical experiments on distributed infrastructures.
     

The ICT challenge is here to develop the theoretical principles needed to scale inference and learning algorithms to a massive scale. This challenge originates from a positive effect of large data volumes as Big Data amplifies the inferential power of algorithms. On the negative side, massive data may amplify the noise that is inherent in any inferential algorithm (e.g. false positives, relationships that seem causal but that are in fact more coincidental) up to the point that it obscures the structures of interest. These downsides thus come with challenges in the area of data management (How to efficiently and securely manage the vast amounts of data? How to handle high dimensional, small sample size data? How to exploit the heterogeneity of the data in the analysis?), challenges in discovery (How to robustly learn cause-effect relations from complex data?) and with challenges in accessibility (How to make the data analysis tools useable for the scientists?)

Advanced computer methods combining statistics with optimization are an essential tool to interpret these data. We view building the necessary tools to support the process of scientific exploration as the core target of our research in the COMMIT/project.


Biggest results so far

Mutalyzer (video at the end of this page)

We have developed a software tool, called the Variant Description Extractor, that rapidly compares one human genome with another in order to find small but crucial genetic differences. Our tool generates a complete description for the human genome in about four hours. More.

ICT science question: the main scientific challenge is twofold. First, how to calculate short and unique descriptions from long strings of the letters that compose the genes? The genes can hold thousand to many millions of these four letters. Second, how can this calculation be done within an acceptable and minimal amount of computational time?

Involved COMMIT/partners: LUMC

Rapidly finding variations between human genomes

We have developed a software tool, called the Variant Description Extractor, that rapidly compares one human genome with another in order to find small but crucial genetic differences. Our tool generates a complete description for the human genome in about four hours. The human genome contains twenty to twenty-five thousand genes distributed over a long molecule, called DNA. Genes can be described by long strings of the four letters A, C, G and T. Each of them stands for a simpler molecule in the DNA. On average, humans only differ 0,1% genetically from each other. However, especially for finding causes and solutions to diseases it is crucial to find and understand these small differences. More

ICT science question: the main scientific challenge is twofold. First, how to calculate short and unique descriptions from long strings of the letters that compose the genes? The genes can hold thousand to many millions of these four letters. Second, how can this calculation be done within an acceptable and minimal amount of computational time?

Involved COMMIT/partners: LUMC

Web-based tools for handling biomedical Big Data (video at the end of this page)

Biomedical research is facing Big Data challeng­es. At present however, researchers don’t have user-friendly IT tools to handle these data. To solve this problem, Science Gateways are de­veloped. Science Gateways are built as easy-to-use, web-based and scalable tools that manage and integrate data, methods and infrastructure for scien­tific research. Better, faster, cheaper biomedical research. Our Science Gateways enable researchers to handle their biomedical Big Data and harness the power of Big Computers without bothering about the IT-complexity inside. More.

ICT science question: how can Science Gateways deal with the perplexing amount and variety of system components? How can Science Gateways deal with the fact that the requirements from scientists are unknown or changing? Our research adopts principles of design science for information systems: we interactively build science gateways, validate them in the field and generate methodologies and best practices for the construction of future gateways. Our approach is unique because our gateways are designed for, evaluated by, and adopted by researchers in daily practice. Furthermore, our methodology and technology enables fast construction of new gateways across scientific domains.

Involved COMMIT/partners: AMC, Universiteit van Amsterdam, Sci-Bus

Molecular biobanks unravel the secrets of breast cancer

Winning NGI Venture Challenge for this breast cancer project
Winning NGI Venture Challenge for breast cancer projectVisualization of molecular data for personalized, predictive, participatory and preven­tive medicine. Modern medicine tries to understand diseases more and more by looking at the molecular fin­gerprint of a disease. This is done by molecular imaging of biological tissues. Molecular imaging can assist in the diagnosis and prognosis of dis­eases. It also enables the development of medicine specific for you and not just for everybody, a boom­ing research field called ‘personalized medicine’. Our demo presents a 3D-visualization of molecular imaging data generated by Mass Spectrometry Im­aging (MSI). MSI is a technique for the simultaneous detection and visualization of a large variety of mol­ecules based on their molecular masses. Using our interactive tool you can view and explore 3D-images of molecular breast cancer tissue. More.

ICT science question: the main scientific challenge is to reduce, process, analyze and interpret huge datasets.

Involved COMMIT/partners: Amolf, NKI-Antonie van Leeuwenhoed, PS-Tech

Video: