Data has to take a long and sometime tortuous path from conceptual modelling and data collection processes through actual raw numbers to information and intelligence in order to be of use to policy makers, analysts and researchers. Rather than seeing data as a number of static data files on the one hand or as part of a complex database on the other, we need systems that allow data to flourish by moving freely through their different manifestations, whilst knowing where they have come from and where they might be able to go.
Data are unlike almost any other resource. Of themselves, they do not diminish, decay or dissipate, but with careful handling grow in value as they are discussed, referenced, enhanced, derived and explained. However it is rare for the data to be managed in such a way that they can be re-used efficiently and effectively. Despite the fact that many agencies collect and derive data (market research, central government, health authorities to name just a few) it is rare for more than standard analyses and reports to be produced as an end product. In order to fully exploit the commercial and intellectual value of the data, a more comprehensive view of the data and their context is needed. Data that are necessary to create the knowledge products include survey data, spreadsheets, aggregates, classifications and administrative records. These all have to have some metadata structure and semantics in common, for example source, title and date. If we are to develop the web for statistical data, we need to have tools that allow systems to interrogate each other and derive understanding from remote systems.
So massive has been the growth of electronic content, that many large organisations, particularly within government, are developing content management systems to manage information. These systems de-couple the content from the display of the information. Page frameworks and templates are constructed so that the content can be managed efficiently via a database and the display built up from various components. However the metadata aspects of these systems are not so well developed and tend to concentrate on resource discovery. It is not sufficient for statistical information and one of the challenges that such systems face is to link in with specialist statistical frameworks. Given that the delivery of knowledge products may well be via content management systems, it is necessary to develop models that combine or link the metadata views of both the standard content systems and the more comprehensive data systems.
Our objective is to link the data and text systems as seamlessly as possible. More sophisticated metadata standards and structures are required in order to create effective knowledge systems incorporating both text and data. Metadata can have many components (e.g. discovery, contextual, quality, structural, conceptual) and many views or combinations of those components. A plethora of metadata models and standards has arisen over the last few years, going from the highly conceptual and formalised such as the ISO 11179 to the practical data transfer models such as Triple-S. In addition there are the discovery metadata, library standards such as Dublin Core and e-GMS. In between these is the DDI model, designed for survey and aggregate data. It has an emphasis on completeness of description for the data resource, including the provenance, contextual information, detailed data dictionary and hyperlinks. Additional to these standards is the need to handle process metadata, in order for the end user to know how a particular table or indicator was derived or modelled.
Given the growth of web based data and content management systems, the final part of the presentation will examine how some users are using these metadata models to build systems that do incorporate both modern content management and advanced data browsing tools. The existing achievements and remaining obstacles will be reviewed, with the aim of deepening our understanding of the core issues from the perspective of both the information user and the information provider.
Building the tools for the richest possible use of data requires advanced yet easy to use standardised metadata systems. They must be able to be fed by existing systems on the one hand and link to emerging content management systems on the other. As a result they form the heart of any top class data system in the era of the semantic web and so help data to have a long and healthy life at the core of meaningful knowledge products.
| Back to: Top | Programme | Page last updated on 31 August, 2003 |