You are here

Datawarehouse Virtualization

TimeTrails (Spatiotemporal Data Warehouses for Trajectory Exploitation)

Objective

Design a distributed spatio-temporal warehouse geared at system elasticity using DBMS virtualization.

Background

Virtualization of a database application in a cloud requires great care in the amount of resources to be dedicated, e.g., RAM, disk and CPUs. The state-of-the-art merely mimics a raw device onto a hardware platform, where it has to compete with others for resources. It easily leads to suboptimal performance in terms of resources and energy consumption compared to dedicated distributed systems.

Description of work

One of the benefits of a database system is that it has the knowledge to dynamically (re-) partition the database to minimize the number of virtual servers needed. The underlying technology has been pioneered in the context of a centralized solution and is known in the literature as “database cracking” and “sharding”. A variation closely related to hardware infrastructures has been recently studied in the context of the DataCyclotron [20]. It provides an outlook on significant throughput improvement over complex queries. All schemes innovate over existing commercial solutions by adapting the storage layout as part of the query processing requests. Research has shown that this approach carries a lot of potential They aid in the creation of a fully-self-organizing distributed version of MonetDB.

Key scientific challenges are

  • the level of database cracking/partitioning in relation to the time required to instantiate a virtual server or data access latency,
  • the number of replicas to maximize response time,
  • the scheduling of up/down time of virtual servers, and
  • the system hardware topology.

The technology will be evaluated against examples provided throuhg the WP-1, and WP-3 and the Hyves infrastructure. Hyves already have a cluster comprised of several thousands of machines, which includes tens of MySQL database servers working in a sharded setup. This set up has clear limitations in terms of maintenance, and is consider inadequate to reach out into business intelligence application settings.

The approach taken here is to challenge the evolving datawarehouse system with selected use-cases encountered in the Hyves setting. The focus is initially on fast stream processing. This includes massive click stream with trajectory events that should become input for both malicient behavior detection, e.g., spam and DDOS, and user profiling.

Further evaluation will be performed in the context of astronomy, e.g., LSST, and commercial applications, e.g., Hyves, Nozhup BV a client of MonetDB BV.