The book icons (
)
link to extended abstracts of the papers, and the slides (
)
to the presentations (some of these are rather large).
Session 1, organised by the Office for National Statistics
|
|
|
Karen Dunnell National Statistician, ONS |
Keynote: Towards a single continuous population survey for the UK The paper will discuss ONS plans to redesign its existing continuous household surveys (GHS, EFS, LFS, Omnibus) into a single module-based survey. It will cover - rationale, methodology, efficiency and statistical benefits. |
|
Allyson Seyb Statistics New Zealand |
Statistics
New Zealand's Longitudinal Business Frame This paper introduces Statistics NZ's new longitudinal research database, the Longitudinal Business Frame (LBF), and describes the use of probabilistic matching in the LBF. The LBF, with its economy-wide coverage and basic data items (attributes) such as employment, location, industrial activity and ownership relationships, is a rich source of longitudinally linked business data. The LBF has information on business activity at both the plant ('establishment') level and at the enterprise level. |
|
Philip Cookson & Jason Sobell Philology Pty Ltd., Australia |
The Architectural Design of a Survey Questionnaire and
Respondent Data Repository: Practical Considerations This paper will examine the technical requirements for the design of a survey questionnaire and respondent data repository capable of efficiently storing, retrieving, and analyzing survey questionnaire and respondent data, and explain the application of the system for facilitating cross-wave and cross-study data analysis of market research survey results. |
|
Kevin Wavell Technical Director, TGI Surveys, BMRB Ltd |
The Role of
Software as a value added tool in Survey Research TGI is a large, continuous single-source survey with a history going back more than 35 years. It collects data on all aspects of purchasing, behaviour, attitude and media consumption and the delivery of the survey database has taken full advantage of technical and software advancement over the period, particularly with regard to maximising the use of the published database. TGI has had access to a wide range of software and has used this to assist users to find ways of understanding the data, and it continues to strive to find innovative solutions to some of the problems arising from the successful expansion of the product. This paper aims to cover aspects of the software involved in using and re-using TGI data, and will examine more closely some examples of the techniques used. |
Session
2, organised by the Royal Statistical Society
|
|
|
Nicky Best Imperial College, London |
Keynote: Modelling
complexity in health and social sciences: Bayesian graphical models as a tool for combining multiple sources of information Researchers in substantive fields such as social, behavioural and health sciences face some common problems when attempting to construct and estimate realistic models for phenomena of interest. The available data tend to be observational rather than collected via carefully controlled experimentation, and are typically fraught with missing values, unmeasured confounders, selection biases and so on. These features often render the use of standard analyses misleading; instead a comprehensive set of inter-dependent sub-models are needed to model the data complexities and core processes that researchers want to understand. It is also invariably the case that a single dataset fails to provide all the necessary information, and many complex research questions require the combination of datasets from multiple sources. Bayesian graphical models provide a natural framework for combining a series of local sub-models, informed by different data sources, into a coherent global analysis. This talk will introduce the key ideas behind Bayesian inference and graphical models in this context and show how they can be used to easily construct models of almost arbitrary complexity. The ideas will be illustrated by applications involving the integration of survey data, census data and routinely collected health data. The use of the WinBUGS software for Bayesian modelling will be illustrated. |
|
Bill Browne |
MCMC Estimation for random effect modelling - The MLwiN experience Multilevel models and their extensions to other random effect models that account for the underlying dependence structure of the data when modelling have become very popular in many application areas. I first came across random effect models in my PhD studies when I compared Bayesian (MCMC) methods to standard likelihood based methods for fitting multilevel models. As a by-product of my PhD I added some basic MCMC functionality to the multilevel modelling package MLwiN and my research since has often focussed on building on this research. In this talk I will contrast between MCMC and likelihood based methods for complex models and focus on the ease of model extension offered by MCMC methods. I will discuss the incremental approach that has been used in MLwiN development and concentrate in particular on two extensions to the multilevel model, cross classifications and multiple membership models. I will end my talk by discussing further work that is currently starting in various research projects associated with the MLwiN development team including multilevel factor modelling, models with responses at various levels, missing data through multiple imputation and sample size calculations for complex random effect models. |
|
Danny Pfeffermann |
Small Area Estimation under a Two Part Random Effects Model with
Application to Estimation of Literacy in Developing Countries The UNESCO Institute for Statistics has initiated a programme to collect data on the level of literacy of adults in developing countries. This will involve conducting small-scale surveys in a few countries that will consist of giving interviewees aged 15+ a test to measure their literacy score. One of the main objectives of these surveys is to obtain summary measures of literacy levels in small geographical areas for which only very small samples would be available, thus requiring the use of model based small area estimation methods. Available methods are not suitable, however, for this kind of data due to the mixed distribution of the literacy scores in developing countries. This distribution has a large peak at zero, i.e., a large proportion of adults that are illiterate, and juxtaposed to this peak is an approximately bell-shaped distribution of the non-zero scores measured for the rest of the sample. In this presentation we will develop a two part three-level model that is suitable for this kind of data and show how to obtain the small area measures and their variances, or compute confidence intervals, based on this model. The proposed method will be illustrated using simulated data and data obtained from a similar literacy survey conducted in Cambodia. |
|
EBLUP-type Estimation of Local Authority Unemployment As in many other countries, the Labour Force Survey (LFS) serves as the key source of national information about the UK labour market, and in particular about numbers of unemployed and associated unemployment rates. However, the small sample size of the LFS in many local authority districts (LADs) limits the use of LFS estimates of unemployment at LAD level. Application of standard methods for small area estimation based on linear models also fails in this situation because the response variable of interest (unemployed/not unemployed) is dichotomous. An empirical best linear unbiased-type (EBLUP-type) method based on a logistic model for unemployment can be used to estimate unemployment at local areas. This model is an extension of the usual linear logistic model, and includes an LAD-specific random effect in the linear predictor. Estimates of the parameters of the model, including those associated with the random effect, are obtained using maximum likelihood and restricted/residual maximum likelihood methods. In this paper we describe how the Office for National Statistics has implemented this methodology in SAS. We also provide results from a realistic simulation study carried out by the ONS that examines the performance of these EBLUP-type estimators as well as associated estimates of their variability. |
|
Session
3, organised by the Association for Survey Computing
|
|
|
Andrew Westlake Survey & Statistical Computing |
Keynote: Combining Data and Knowledge in Models: Promises and Problems We collect data in order to increase our knowledge, but we always have some knowledge before we start. Our existing knowledge raises the questions for which we need more information, and it also guides us in deciding what further data to collect and how to collect it. Models allow us to generalise from specific observed data to a wider situation. When we analyse data we (usually) update our knowledge. If we can find a formal representation for our knowledge, then a standard statistical technique provides a way to formalise the process of updating our knowledge. This can be the basis for the integration of multiple data sets that relate to different aspects of the same system. While of general importance, this approach is the only way of developing an integrated understanding of complex systems which are too extensive to observe with a single data set. But complex methodology is difficult to understand, so we must also address the issues of convincing users from the application domain that our models are appropriate and valid, and of making the results obtained from the methodology accessible. The talk will address these issues and illustrate them with experiences from the Opus project (www.opus-project.org) which, amongst other things, is looking at the problems of simultaneously modelling all forms of passenger movement in London. |
|
Ken Miller,
Ekkehard Mochmann & Jostein Ryssevik UK Data Archive |
European
Unification through Initiative Comparative social science research in Europe is hampered by the fragmentation of the scientific information space. Data, information and knowledge are scattered in space and divided by language and institutional barriers. As a consequence too much of research is based on data from a single nation, carried out by a single-nation team of researchers and communicated to a single-nation audience. In order to advance interoperability, data bases must be improved by metadata standards and appropriate documentation of measurement instruments. This paper will present recent developments in the field of social research, in particular the Madiera and Metadater projects, which are laying the ground for the social science GRID and have used the Data Documentation Initiative (DDI) as a building block. |
|
Phil Edwards School of Law, University of Manchester |
Bridging the
gap – Metadata in e-social science One of the problems for social science researchers trying to use multiple datasets is that concepts and classifications across these datasets differ. This is not just an accident that could have been prevented with more careful planning; it is in the nature of social science concepts, which are often fuzzy and overlapping. Definitions are constructed for a purpose, and are bound up in the social practices and contexts in which they arise: we need only consider the social 'facts' represented by records of the incidence of 'drug abuse' or 'anti-social behaviour'. The challenge is to record both these ‘facts’ and the circumstances of their production. Social science researchers need to take a three-dimensional view of data: in terms of its underlying topic area; the claims and interactions which produced the data; and the meanings and associations which were effectively written into it. |
|
VS Chalasani & KW Axhausen |
Conceptual
data model for integrated transport and land-use data All the persons involved in transport and land-use planning are at some stage involved with data, if not produced, might have analysed. Each transport survey is conducted for a set of objectives. Data obtained from these transport surveys does not follow any specific pattern, hence difficult to understand. At the same time, a research organization conducts a wide variety of surveys ranging from simple road-side interviews to the complex travel diaries. These surveys can be either longitudinal surveys or cross-sectional surveys. Differences in methodology, design, and protocols often obscure basic differences in data among surveys. Above all, it is almost impossible to collect complete information about the existing transportation system in a single survey. Most of the transport surveys collect partial and very relevant information and depends on other sources for the additional information. To solve the difference in interactions among various datasets obtained from different surveys, an attempt is made to develop conceptual data model for the integrated transport and land-use data. |
Session
4, organised by the Market Research Society
|
|
|
George Terhanian President, HI Europe |
Keynote:
The Design and Analysis of Research that Exploits Multiple Interviewing
Modes and Multiple Data Sources: |
|
Reginald
Baker Market Strategies, Inc. |
Adding Value
to Data Through Improved Access: The Case for Portals With the increasingly widespread use of the Internet have come new opportunities to leverage the value of survey data. By creating broader access and disseminating easy-to-use analytical tools we not only can make more data available to a broader set of users, we also can minimize the barriers that might exist among the often discrete activities of data collection, data analysis, and data reporting. One prime example in the market research world is the recent deployment of Web portals designed to deliver data more quickly, to increase the value of those data by making them easier to analyze, to deliver data and results deeper into the client’s organization, and to retain historical data for future analyses in an online archive. In the best of these systems, intuitive interfaces help users build self-documenting composite variables, case filters, weights, and report formats, as well as share the results of their analysis with other users. In some, users can design questionnaires, draw samples, and launch surveys. |
|
Margaret Ward Nesstar Ltd presented by Jostein Ryssevik |
Making existing data re-usable - the requirements of a web-enabled tool Using the analogy from the days when information was mostly derived from printed material, we would find the information we required from the local library. This provided the means for information to be shared, discovered, analysed, researched and perhaps published in a different form. Today, the principles of the library still apply when looking at the re-use of data in the web-enabled world. |
|
Mike Trotman DataLucid, Ltd. |
Managing Complex Raw Data Linkage with XML This paper will discuss a general approach to the process of managing and combining multiple Questionnaires with data from different sources. The approach focuses primarily on the underlying task of transforming and combining disparate raw data sets into a common Questionnaire and / or data format. |
| Back to: Top | Conference |
Page last updated on 13 November, 2005 |