Information Publication, Integration and Enhancement

Data2Semantics (From Data to Semantics for Scientific Data Publishers)

Objective: Design, development and evaluation of techniques for publishing, integrating, enriching and reasoning over linked science data.

Scientific data is heterogeneous, both syntactically - it is produced using a variety of tools – and semantically – using a variety of different processes and perspectives. Scientific data must therefore be translated to a unifying paradigm, enriched with the appropriate provenance metadata, interconnected and aligned with shared ontologies, reconciled with data from other datasets, made available for reasoning and querying tasks, and be embedded in scientific publications. This requires extending the current techniques for linked data publishing and enrichment.

We will develop facilitating infrastructure for linked data management in e-Science, including easy-to-use transformation-generators to cope with legacy formats (in association with LATC), vocabulary linking techniques to cope with semantic heterogeneity, and identity-detection components to cope with co- reference problems. The infrastructure will involve work on large scale reasoning developed in P20 and LarKC. Requirements for these services will be provided by (non-)profit partners and through synergy with P6, P12, P20 and P26.

