
Estimating Population Counts with Capture-Recapture Models in the Context of Erroneous Records in Linked Administrative Data

    Dilek Yildiz, Peter G.M. van der Heijden, Peter W.F. Smith

VID Working Papers, pp. 1-29, 2021/09/15

doi: 10.1553/0x003cced6




In the absence of a traditional census and a comprehensive population register (as it is thecase in the UK), administrative data sources, i.e. health, school or tax records, can offer analternative to estimate the size of residence population. However, such data sources are designedto capture information only from specific populations which imposes a challenge to theestimation. A suitable method to overcome the challenge is to link administrative data sourcesthat collect information from different but overlapping populations and use capture-recapturemodels to estimate the population counts. There are various assumptions required to obtainunbiased estimates by using capture-recapture models. In practice, especially the assumptionson ‘homogeneous inclusion probabilities’ and ‘no over coverage’ are often not met. This paperproposes a two-step procedure for estimating population counts with capture-recapture modelsthat account for heterogeneity of inclusion probabilities and the over coverage in the datasources. We apply our methodology to the linked Patient Register and Customer InformationSystem dataset which violates both of the aforementioned assumptions. The Patient Registerincludes people who are registered with a National Health Service General Practitioner doctor.In 2011, the Patient Register overestimated the size of England and Wales populationby 4.3% (over coverage) and its sex ratio was different than the 2011 Census estimates (heterogeneousof inclusion probabilities). The Customer Information System dataset providesinformation on all individuals who have ever had a national insurance number and childrenwhose parents have made a child benefit claim relating to them. In 2011, it over estimatedthe size of England and Wales population by 9.5% and its age and sex structure was differentthan the 2011 Census estimates. Applying our approach, we estimate population counts ofthe South East region of England by age, sex and local authority, and compare them withcensus estimates using percentage difference maps.

Keywords: Administrative data, capture-recapture models, combining data, log-linear model with offset, population estimates, dual-system estimation