The efficacy of measuring judicial ideal points: The mis-analogy of IRTs
Introduction
Latent concepts and measurements are ubiquitous in political science, law, and economics. Because the core assumption in any paradigm of thought must be about concepts that we fundamentally cannot observe (Lakatos, 1976), such as preferences, ideology, or aptitude — there are extensive literatures devoted to how precisely we can measure their shadow on the real world, such as by observing people’s choices. Since the highly influential work of Poole and Rosenthal, 1991, Poole and Rosenthal, 2000, numerous scholars have used spatial voting models to estimate ideological preferences from roll-call votes and other choice data; these measures and models are often vital to significant swaths of the literature in American and comparative politics (Imai et al., 2016) and law.
A widely used method to measure these latent traits is the item response theory model (IRT). These models, borrowing from literature developed in education testing and psychometrics, are a relatively fast way of fitting a latent trait model on diverse sets of choice data that arise from different choice environments. IRT methods have been applied to study: public opinion formation (Treier and Sunshine Hillygus, 2009; Tausanovitch and Warshaw, 2014), the ideology of actors in political institutions (Clinton et al., 2004; Martin and Quinn, 2002; Bonica, 2014), aggregation of expert rating models (Clinton and Lapinski, 2006; Treier and Jackman, 2008; Linzer and Staton, 2012) among many others.
Many of these approaches, however, disregard an important output of the IRT model that is vital to understanding the model’s fit. Because of this disregard, IRT models are often used in a way that violates the fundamental assumptions of the IRT models and the estimates of latent traits. In this paper, we argue the main limitation to existing IRT models used in service of ideal point estimation in judicial politics is a misspecification of item parameters.1 We will show that the choice of test questions to measure unobserved aptitude, the purpose of which IRT was created, may not be a good analogy to voting and ideology. In particular, we will show that we need to pay better attention to the ‘item’ part of the IRT models if we are to limit (or eliminate) bias in our measures.
In the sections that follow, we present and discuss the exact formulation of IRT models, how they are used in political science and legal scholarship, and the development of the Martin and Quinn’s (2002) Dynamic Ideal Point Model, the most widely used ideal point model in the study of judicial politics and law. We then demonstrate the bias in their estimates and correct these estimates for the problems we identify. We then follow this by replicating two recent studies of judicial behavior using our new judicial ideology scores. We then show that the specification and construction of judicial ideology scores do indeed impact empirical models of judicial behavior. We follow this with our reservations about assuming that judicial ideology is unidimensional and we suggest improvements that can be made within the political science latent scale modeling literature more broadly.
Section snippets
IRT by the numbers
Item response theory methods are a class of latent trait models developed in education testing to capture test taker aptitude along with an assessment of a questionnaire (i.e., test or exam) (van der Linden and Hambleton, 2013). Item response models provide the basis for most modern standardized tests and represent improvements made to the education testing literature from classical test theory over the last seven decades. The broad adoption of IRT models in psychometrics and education testing
Martin and Quinn (2002)
The primary work we address is one of the most essential IRT-based ideal point models for political science and law: the Dynamic Ideal Point Model by Martin and Quinn (2002). For this paper, we focus on the Dynamic Ideal Point Model for two reasons: first, it is widely cited and has provided the basis for much of the modern study of judicial behavior. The second reason is that it is a very believable model that makes many reasonable assumptions. Thus, while we focus our attention on the ideal
Applications to judicial politics research
Suppose the estimates of any particular justice’s ideology are incorrect. In this scenario, the results of empirical studies within judicial politics (particularly studies that use these scores as cardinal-level unidimensional independent variables in their models) might be misestimated. We first highlight differences between the median justices identified by the original MQ scores and our recoded MQ Scores. Next, we use our re-estimated MQ scores to replicate a well-known paper that relies on
Discussion and conclusion
As mentioned above, IRT models are a widely used methodology to measure the latent traits in political science and legal scholarship. These models were borrowed from the education testing and psychometrics literature. We believe that scholars have inappropriately ignored assumptions of IRT models in the process of applying them to judicial politics. Specifically, we focused on the violation of the uniformity of direction in the discrimination parameter (α). We cannot compare across items if
Author statement
Joshua Lerner: Conceptualization, Methodology, Validation, Formal Analysis, Writing-Original Draft, Writing - Review & Editing.
Mathew McCubbins: Conceptualization, Methodology, Supervision, Writing-Original Draft.
Kristen Renberg: Conceptualization, Validation, Formal Analysis, Writing-Original Draft, Writing - Review & Editing, Visualization.
References (40)
- et al.
Polarization and ideology: Partisan sources of low dimensionality in scaled roll call analyses
Political Anal.
(2014) The Basics of Item Response Theory. For Full Text
(2001)- et al.
Item Response Theory: Parameter Estimation Techniques
(2004) - et al.
The Basics of Item Response Theory Using R
(2017) The constraining capacity of legal doctrine on the US Supreme Court
Am. Polit. Sci. Rev.
(2009)Mapping the ideological marketplace
Am. J. Pol. Sci.
(2014)- et al.
Agenda control, the median justice, and the majority opinion on the US Supreme Court
Am. J. Pol. Sci.
(2007) - et al.
Who controls the content of Supreme Court opinions?
Am. J. Pol. Sci.
(2012) - et al.
Locating Supreme Court opinions in doctrine space
Am. J. Pol. Sci.
(2010) - et al.
Measuring legislative accomplishment, 1877–1994
Am. J. Pol. Sci.
(2006)