What is already known on this subject
- •
Studies of ethnic inequalities in hospital inpatient care in England are limited by incomplete data on patient ethnicity collected in the 1990s and 2000s.
- •
Ethnicity is defined as a sensitive personal characteristic under European Union (2016) General Data Protection Regulation (GDPR) [1]. It is often considered to be inherently subjective [2] and may not always be collected for reasons of statute [3,4]. This can handicap the conduct of equality audits, analysis of corporate governance [5] and, most recently, monitoring of hospital admissions and outcomes during the COVID-19 pandemic [6,7].
Provision has been made for Hospital Episode Statistics (HES) to include patient-reported ethnicity since 1995 by drawing on a central National Health Service (NHS) patient register [5]. Yet until the late 1990s, ethnicity information was absent from more than half of records of patients who received inpatient care. General practitioners were financially incentivised to record patient ethnicity through the Quality Outcomes Framework (QOF) between 2006–2012 with a resultant increase in completeness of inpatient ethnicity data to more than 80 % during this time [5].
The problem of missing ethnicity data in NHS datasets has previously been studied [8,9]; although not in the full range of ethnic groups in a national study over several years. Ryan et al. (2012) who used Onomap and Nam Pehchan to impute the ethnicity of White, South Asian, Black and Other groups in the UK’s West Midlands [8]. Ryan et al. 2012 used a multiple imputation strategy with characteristics of the individual patients, their care, and the ethnic composition of their neighbourhoods: they reported that the sensitivity of the multiple imputation was above 90 % for White and South Asian ethnicities but was very low for other groups. Smith et al. 2017 used the Onomap software to assign children and young people with cancers to either White, South Asian, or Other groups in a Yorkshire study, concluding that combining different data sources including names-based ones increased the representation of ethnic minorities, albeit with some ambiguity [9]. Both studies concluded that there is no perfect substitute for more complete self-reported ethnicity data.
Personal names are commonly used to impute ethnicity information when self-reported ethnicity data are not collected systematically or available through linkage [10,11]. An early example of names-based ethnicity classification exploiting large scale data sets is the work by Mateos et al. (2011) [11]. The applied cluster analysis to data on personal names and residential codes from telephone directories and other administrative data from 17 different countries. Kandt & Longley [10] used cluster analysis to define more detailed clusters for the UK making use of data on names and country of origin in the Census 2011 microdata [10]. In this paper we report on the use of names-based ethnicity classifications to address incomplete ethnicity information in inpatient hospital records. It is a national study covering the whole of England over fifteen years (1999/00–2013/14). The study quantifies the prediction success of the complete range of ethnic groups – nationally and regionally – against self-reported, NHS-recorded ethnicity. A freely available software, Ethnicity Estimator (EE), was used [10]. EE was developed by the Consumer Data Research Centre (CDRC: cdrc.ac.uk) in partnership with the Office for National Statistics (ONS) and using enhanced algorithmic procedures [10,12]. The results of this study can be used to inform decisions around analysis of ethnicity in HES.
Hospital inpatient admission records were obtained from NHS England HES for the period April 1999-March 2014 (financial years referred by the first year only from here onwards). The ethnicity information was coded on patient forename and surname separately using an enhanced version of the Ethnicity Estimator (EE) software [10]. Where a patient changed surname, e.g. due to marriage, the ethnicity category of the earliest name was used. To retain full anonymity, the coding was carried out in an
A total of 111,231,653 patient-years were recorded between 1999 and 2013. The completeness of ethnicity records improved from 59.5 % in 1999 to 89.2 % in 2009 and peaked at 90.5 % in 2013 (Fig. 1). The biggest absolute improvement was seen in the White British group, which increased from 55.4 % in 1999 to 73.9 % in 2013. Fig. 2 shows increased representation for other ethnic groups.
The sensitivity analysis comparing EE estimates with NHS-recorded ethnic group, in 2013, suggested that the
We found that the completeness of ethnicity data for hospital patients in England improved from 59.5 % in 1999 to 90.5 % in 2013. The biggest improvement was seen in the White British group, which increased from 55.4 % in 1999 to 73.9 % in 2013. The correct prediction of NHS-reported ethnicity varied by ethnic group (2013/14 figures): White British (89.8 %), Pakistani (81.7 %), Indian (74.6 %), Chinese (72.9 %), Bangladeshi (63.4 %), Black African (57.3 %), White Other (50.5 %), White Irish
As a limitation, it should be noted that ethnicity is a complex concept encompassing biological, cultural, and subjective aspects. Which aspect matters most depends on the kind of inequalities that are the object of the study and the related assumptions about disease aetiology. Variation in prediction success of name-based ethnicity classification can therefore arise for different reasons including individuals’ sense of belonging and resulting choice of ethnic group, socio-cultural naming and
Studies of ethnic inequalities in hospital inpatient care in England are limited by incomplete data on patient ethnicity in the 1990s and 2000s. Financial incentives for general practitioners to collect and report ethnicity to the central patient register between 2006 and 2012 have greatly improved completeness during this period. Personal names of patients remain an untapped source for closing this gap for the earlier years. As demonstrated in this - and other studies - name-based ethnicity
The UK Economic and Social Research Council is acknowledged for its support for the UCL Consumer Data Research Centre (CDRC) enabling this research (Grant ES/L011840/1).
All authors made substantial contributions to the conception and design of the study, interpretation of data, revision, and final approval of the submitted version (JP, JK, PAL). JK and PAL contributed to the acquisition of data. JP contributed with analysis and drafting of the article.
Summary table What is already known on this subject Studies of ethnic inequalities in hospital inpatient care in England are limited by incomplete data on patient ethnicity collected in the 1990s and 2000s.
The authors report no declarations of interest.
None.