You are here

IV-e (e-Infrastructure Virtualization for e-Science Applications)

Predicting the earth’s climate with Graphics Processing Units (video at the end of this page)

 

What do climate modeling, astrophysics, and medicine have in common? Big Data.

It requires ‘Big’ computers, too big to be just one computer but rather a number of them tied together. Managing all this data and making ‘sense’ out of it puts huge demands on performance, scalability and energy-efficiency of infrastructures. In many cases these high capacity demands are addressed by including one or more types of accelerators to speed up calculations (e.g. Graphics Processing Units), leading to much more heterogeneous systems. Second,  virtualization of computer, storage, and network resources increases the flexibility to structure and arrange these resources, known as cloud computing. Virtualization techniques are well known in scientific computing, but there are significant challenges with the application of cloud computing to scientific computing. Third, in recent years industry found a mode to adopt and commercialize cloud computing, which leads to different usage models for computer resources. Tools for scientific computing will have to take into account that computer resources will be spread over a multitude of different parties, computer environments, and interfaces.

The "Electronisch Patienten Dossier" is an important topic in this project.

The ICT challenge of this project is to ease the management of highly complicated scientific computing infrastructures by effectively shielding the user from the low-level complexity. The project will investigate how to design a programmable e-Science architecture while describing the infrastructure components and optimize them for typical usage scenarios. The architecture will investigate efficient methods to program data-intensive applications on heterogeneous systems and to build workflow-based collaborative problem solving environments. By solving these challenging ICT problems around resource usage and optimization, This project will enable much easier access to e-Infrastructures, despite their growing complexity.

We primarily target our research at the scientific community. We work together with researchers from climate modeling, astrophysics, and medicine that gave us much valuable feedback and helps scientists to bridge the gap to demanding e-Science applications.

The joint research with COMMIT/project Data2Semantics on distributed reasoning resulted in a cum laude PhD thesis and national recognized (NWO-VENI) award for Jacopo Urbani. The research on climate modelling resulted in an Enlighten Your Research Global award to an international team led by Prof. Henk Dijkstra (Utrecht University) and Dr. Frank Seinstra (Netherlands eScience Center). Another highlight is the two publications we have in the main conference of SC’13, on Exploring Portfolio Scheduling for Long-term Execution of Scientific Workloads in IaaS Clouds and on Scalable Virtual Machine Deployment Using VM Image Caches. Furthermore, the project leader received the Euro-Par Achievement Award in appreciation of the outstanding and sustained contributions to parallel processing in the Netherlands and beyond, including his research on parallel programming environments and his work on the DAS infrastructure.

Partnership with KLM

In 2015 KLM joined this COMMIT/project in order to do research into cyber attacks. It involves forming a group of internet service providers plus associated business networks, which have the mutual interest to minimize the number and expense of cyber-attacks, and to share their knowledge.

According to COMMIT/ director Geleyn Meijer, "COMMIT is delighted to welcome KLM to the COMMIT/ partnership. It demonstrates that scientific research in ICT can be of immense benefit for meeting the challenges of today's society and businesses and that research and practice are not far removed from each other."


Biggest results so far

BTWorld: A Large-scale Experiment in Time-Based Analytics (video at the end of this page)

These days, large amounts of data are col­lected about the operation of many im­portant systems, for instance, traffic sys­tems and the financial system. Extracting meaningful information is very challeng­ing: big data must be processed in time and without error. At TU Delft, for the last four years, we have been collecting data about BitTorrent, a system used by hundreds of millions of people worldwide for sharing videos and other files. For example, musi­cians use it for the distribution of their work and software developers for the distribution of open source software. More.

ICT science question: despite a large number of empirical and theoretical studies, observing the state of the global infor­mation networks remains a grand challenge. The main question we set out to answer was how to reliably analyze large scale time based datasets through different types of queries.

Involved COMMIT/partners: TUDelft.

Predicting the earth’s climate with Graphics Processing Units (video at the end of this page)

In order to predict the earth’s climate, we need to understand the interaction between the atmosphere (air) and the oceans (water). Only at a resolution smaller than two kilometres essential physical phenomena such as ocean eddies are resolved in the ocean models.We develop ways in which climate modellers can use the enormous computing power that they need for high-resolution and long-running modelling. As high-resolution climate models require great computational power, we use Graphics Processing Units (GPUs) to perform the com­putations. More.

ICT science question: how to optimize data transfers between hosts and GPUs? Real programs contain dozens of kernels, i.e. small computer programs that manage input-output requests. On GPUs, the computational time of these individual kernels can often be optimized and reduced to virtually zero. At that point the transfer times between all these GPU kernels become the next bottleneck. The problem is that there are many different mechanisms for these transfers and the best mechanism depends on details of the algorithm. To solve this problem, we have developed a generic performance model that greatly helps in decid­ing which mechanism is optimal, thus avoiding the need to implement and measure all alternatives.

Involved COMMIT/partners: IMAU, eScience Center, VU Amsterdam.

Video: