Section outline

  • How to test/validate a regression model?

    Regression models are powerful tools frequently used to predict a dependent variable from a set of predictors. They are widely used in a number of different contexts. An important problem is whether results of the regression analysis on the sample can be extended to the population the sample has been chosen from. If this happens, then we say that the model has a good fit and we refer to this question as a goodness-of-fit analysis, performance analysis or model validation analysis for the model (Hosmer and Lemeshow, 2000; D’Agostino et al., 1998; Harrell et al., 1996; Stevens, 1996). Application of modelling techniques without subsequent performance analysis of the obtained models can result in poorly fitting results that inaccurately predict outcomes on new subjects. We deal with how to measure the quality of the fit of a given model and how to evaluate its performance in order to avoid poorly fitted models, i.e. models which inadequately describe the above mentioned relationship in the population. First we state an important preliminary assumption and the aim of our work, and we introduce the concept of goodness-of-fit and the principle of optimism. Then we illustrate a brief review of the diverse techniques of model validation. Next, we define a number of properties for a model to be considered “good,” and a number of quantitative performance measures. Lastly, we describe a methodology for the assessment of the performance of a given model.