To protect public-use microdata, one approach is not to allow users access to the microdata. Instead, users submit analyses to a remote computer that reports back basic output from the fitted model, such as coefficients and standard errors. To be most useful, this remote server also should provide some way for users to check the fit of their models, without disclosing actual data values. In this presentation, I propose remote server diagnostic methods for several commonly used models, including linear and logistic regressions. Specifically, I propose that remote servers provide synthetic, i.e. computer generated, values of dependent and independent variables, residuals, and other diagnostic statistics. Users then can treat these synthetic values like ordinary diagnostic quantities, for example by examining scatter plots of the synthetic residuals versus the synthetic independent variables. In a variety of simulations, I show that the synthetic diagnostics can reveal model inadequacies without substantial increase in the risk of disclosure.
| Back to: Top | Programme | Page last updated on 31 August, 2003 |