You are here

Information Content and Utility

Data2Semantics (From Data to Semantics for Scientific Data Publishers)

Objective: Develop theory and method to measure utility of a given query Q in a given context C over given data D.

Measures of information content and complexity can be used to optimize ‘matching’ and query utility in a given set of conditions, including context and data.

In WP1 complexity issues concerning queries on large datasets will be studied building on the framework described in Adriaans (2009): Given a certain system S with a certain complexity in the world (i.e. the human brain, climate, DNA, and art style or simply a railroad time table) and a canonical measurement function, i.e. and information channel with certain characteristics that creates a data set D with information 'about' S, under what conditions may we assume that a query Q of a certain form on D indeed returns adequate information about S?

In such we will analyze, amongst other things:

  1. The conditions under which you can extract 'true' isolated facts from a data set but no general insights,
  2. the question whether complex systems that are undersampled create powerlaw distributions (see last point i.e. powerlaws have no means), and
  3. the interplay between model information and complexity in the analysis of various systems (i.e. facticity: noise is complex but has a simple model, fractal structures look complex but are simple, lots of structures in nature are both complex and have complex models, specifically products of evolutionary processes).
WP Leader: