Names-based ethnicity enhancement of hospital admissions in England, 1999–2013

https://doi.org/10.1016/j.ijmedinf.2021.104437Get rights and content

Highlights

  • Ethnicity data were missing for more than half of patients admitted to English hospitals in 1990s.

  • Name-based ethnicity classifications have merit for the predictions of many ethnic minorities.

  • Prediction success of a names-based ethnicity classification tool has been quantified.

Abstract

Background

Accurate recording of ethnicity in electronic healthcare records is important for the monitoring of health inequalities. Yet until the late 1990s, ethnicity information was absent from more than half of records of patients who received inpatient care in England. In this study, we report on the usefulness of a names-based ethnicity classification, Ethnicity Estimator (EE), for addressing this gap in the hospital records.

Materials and methods

Data on inpatient hospital admissions were obtained from Hospital Episode Statistics (HES) between April 1999 and March 2014. The data were enhanced with ethnicity coding of participants’ surnames using the EE software. Only data on the first episode for each patient each year were included.

Results

A total of 111,231,653 patient-years were recorded between April 1999 and March 2014. The completeness of ethnicity records improved from 59.5 % in 1999 to 90.5 % in 2013 (financial year). Biggest improvement was seen in the White British group, which increased from 55.4 % in 1999 to 73.9 % in 2013. The correct prediction of NHS-reported ethnicity varied by ethnic group (2013 figures): White British (89.8 %), Pakistani (81.7 %), Indian (74.6 %), Chinese (72.9 %), Bangladeshi (63.4 %), Black African (57.3 %), White Other (50.5 %), White Irish (45.0 %). For other ethnic groups the prediction success was low to none. Prediction success was above 70 % in most areas outside London but fell below 40 % in parts of London.

Conclusion

Studies of ethnic inequalities in hospital inpatient care in England are limited by incomplete data on patient ethnicity collected in the 1990s and 2000s. The prediction success of a names-based ethnicity classification tool has been quantified in HES for the first time and the results can be used to inform decisions around the optimal analysis of ethnic groups using this data source.

Introduction

Ethnicity is defined as a sensitive personal characteristic under European Union (2016) General Data Protection Regulation (GDPR) [1]. It is often considered to be inherently subjective [2] and may not always be collected for reasons of statute [3,4]. This can handicap the conduct of equality audits, analysis of corporate governance [5] and, most recently, monitoring of hospital admissions and outcomes during the COVID-19 pandemic [6,7].

Provision has been made for Hospital Episode Statistics (HES) to include patient-reported ethnicity since 1995 by drawing on a central National Health Service (NHS) patient register [5]. Yet until the late 1990s, ethnicity information was absent from more than half of records of patients who received inpatient care. General practitioners were financially incentivised to record patient ethnicity through the Quality Outcomes Framework (QOF) between 2006–2012 with a resultant increase in completeness of inpatient ethnicity data to more than 80 % during this time [5].

The problem of missing ethnicity data in NHS datasets has previously been studied [8,9]; although not in the full range of ethnic groups in a national study over several years. Ryan et al. (2012) who used Onomap and Nam Pehchan to impute the ethnicity of White, South Asian, Black and Other groups in the UK’s West Midlands [8]. Ryan et al. 2012 used a multiple imputation strategy with characteristics of the individual patients, their care, and the ethnic composition of their neighbourhoods: they reported that the sensitivity of the multiple imputation was above 90 % for White and South Asian ethnicities but was very low for other groups. Smith et al. 2017 used the Onomap software to assign children and young people with cancers to either White, South Asian, or Other groups in a Yorkshire study, concluding that combining different data sources including names-based ones increased the representation of ethnic minorities, albeit with some ambiguity [9]. Both studies concluded that there is no perfect substitute for more complete self-reported ethnicity data.

Personal names are commonly used to impute ethnicity information when self-reported ethnicity data are not collected systematically or available through linkage [10,11]. An early example of names-based ethnicity classification exploiting large scale data sets is the work by Mateos et al. (2011) [11]. The applied cluster analysis to data on personal names and residential codes from telephone directories and other administrative data from 17 different countries. Kandt & Longley [10] used cluster analysis to define more detailed clusters for the UK making use of data on names and country of origin in the Census 2011 microdata [10]. In this paper we report on the use of names-based ethnicity classifications to address incomplete ethnicity information in inpatient hospital records. It is a national study covering the whole of England over fifteen years (1999/00–2013/14). The study quantifies the prediction success of the complete range of ethnic groups – nationally and regionally – against self-reported, NHS-recorded ethnicity. A freely available software, Ethnicity Estimator (EE), was used [10]. EE was developed by the Consumer Data Research Centre (CDRC: cdrc.ac.uk) in partnership with the Office for National Statistics (ONS) and using enhanced algorithmic procedures [10,12]. The results of this study can be used to inform decisions around analysis of ethnicity in HES.

Section snippets

Materials and methods

Hospital inpatient admission records were obtained from NHS England HES for the period April 1999-March 2014 (financial years referred by the first year only from here onwards). The ethnicity information was coded on patient forename and surname separately using an enhanced version of the Ethnicity Estimator (EE) software [10]. Where a patient changed surname, e.g. due to marriage, the ethnicity category of the earliest name was used. To retain full anonymity, the coding was carried out in an

Results

A total of 111,231,653 patient-years were recorded between 1999 and 2013. The completeness of ethnicity records improved from 59.5 % in 1999 to 89.2 % in 2009 and peaked at 90.5 % in 2013 (Fig. 1). The biggest absolute improvement was seen in the White British group, which increased from 55.4 % in 1999 to 73.9 % in 2013. Fig. 2 shows increased representation for other ethnic groups.

The sensitivity analysis comparing EE estimates with NHS-recorded ethnic group, in 2013, suggested that the

Discussion

We found that the completeness of ethnicity data for hospital patients in England improved from 59.5 % in 1999 to 90.5 % in 2013. The biggest improvement was seen in the White British group, which increased from 55.4 % in 1999 to 73.9 % in 2013. The correct prediction of NHS-reported ethnicity varied by ethnic group (2013/14 figures): White British (89.8 %), Pakistani (81.7 %), Indian (74.6 %), Chinese (72.9 %), Bangladeshi (63.4 %), Black African (57.3 %), White Other (50.5 %), White Irish

Limitations

As a limitation, it should be noted that ethnicity is a complex concept encompassing biological, cultural, and subjective aspects. Which aspect matters most depends on the kind of inequalities that are the object of the study and the related assumptions about disease aetiology. Variation in prediction success of name-based ethnicity classification can therefore arise for different reasons including individuals’ sense of belonging and resulting choice of ethnic group, socio-cultural naming and

Conclusion

Studies of ethnic inequalities in hospital inpatient care in England are limited by incomplete data on patient ethnicity in the 1990s and 2000s. Financial incentives for general practitioners to collect and report ethnicity to the central patient register between 2006 and 2012 have greatly improved completeness during this period. Personal names of patients remain an untapped source for closing this gap for the earlier years. As demonstrated in this - and other studies - name-based ethnicity

Source of funding

The UK Economic and Social Research Council is acknowledged for its support for the UCL Consumer Data Research Centre (CDRC) enabling this research (Grant ES/L011840/1).

Authors’ contributions

All authors made substantial contributions to the conception and design of the study, interpretation of data, revision, and final approval of the submitted version (JP, JK, PAL). JK and PAL contributed to the acquisition of data. JP contributed with analysis and drafting of the article.

Summary table

What is already known on this subject

  • Studies of ethnic inequalities in hospital inpatient care in England are limited by incomplete data on patient ethnicity collected in the 1990s and 2000s.

Declaration of Competing Interest

The authors report no declarations of interest.

Acknowledgements

None.

References (20)

  • N. Bhala et al.

    Sharpening the global focus on ethnicity and race in the time of COVID-19

    Lancet

    (2020)
  • F. Lakha et al.

    Name analysis to classify populations by ethnicity in public health: validation of Onomap in Scotland

    Public Health

    (2011)
  • J. Petersen et al.

    Names-based classification of accident and emergency department users

    Health Place

    (2011)
  • European Union, Regulation (EU) 2016/679 of the European Parliament and the Council of 27 April 2016 on the Protection of Natural Persons With Regard to the Processing of Personal Data and the Free Movement of Such Data, and Repealing Directive 95/46/EC (General Data Protection Regulation)

    (2016)
  • B. Byrne et al.

    Ethnicity, Race and Inequality in the UK - State of the Nation

    (2020)
  • The Economist

    American Ideas About Racism Are Influencing Europe, the Economist

    (2020)
  • The Economist

    An Edgy Inquiry, the Economist

    (2015)
  • R. Mathur et al.

    Completeness and usability of ethnicity data in UK-based primary care and hospital databases

    J. Public Health Oxf. Engl.

    (2014)
  • R.W. Aldridge et al.

    Black, Asian and Minority Ethnic groups in England are at increased risk of death from COVID-19: indirect standardisation of NHS mortality data

    Wellcome Open Res.

    (2020)
  • D.R. Thomas, O. Orife, A. Plimmer, C. Williams, G. Karani, M.R. Evans, P.A. Longley, J. Janiec, R. Saltus, A.G....
There are more references available in the full text version of this article.

Cited by (0)

View full text