The efficacy of measuring judicial ideal points: The mis-analogy of IRTs

https://doi.org/10.1016/j.irle.2021.106020Get rights and content

Highlights

  • IRT models of ideal points in law and political science often fail to utilize information about the item parameters.

  • We introduce a post-estimation trimming procedure that addresses modeling issues without requiring dropping IRTs altogether.

  • Better adherence to principals of IRT modeling will produce more consistent and reliable estimates of ideal points.

  • We demonstrate this through extending Martin and Quinn (2002), showing differences in spatial location of Supreme Court justices, including estimated medians.

Abstract

IRT models are among the most commonly used latent trait models in all of political science, particularly in the estimation of ideal points of political actors in institutions. While widely used, IRT models are often misapplied, and a key element of their estimation, the item parameters, are almost always ignored and discarded. In this paper, we look into the application of IRT models to the estimation of judicial ideology scores by Martin and Quinn (2002). Building off of a replication and extension of Martin and Quinn (2002), we demonstrate that the often-ignored item parameters are, in fact, inconsistent with the assumptions of IRTs. Then, using a post-estimation fix that is designed to ameliorate the problem, we run the model again, generating new scores. We then compare our new ideal points to the existing ideal points and discuss the implications for both ideal point modeling generally and in judicial politics specifically. We conclude by replicating a prominent study in judicial politics that demonstrates how inconsistencies in the estimation of IRT models can be consequential and bring up concerns with the implications for what this could mean for the usefulness of scores estimated via IRT models.

Introduction

Latent concepts and measurements are ubiquitous in political science, law, and economics. Because the core assumption in any paradigm of thought must be about concepts that we fundamentally cannot observe (Lakatos, 1976), such as preferences, ideology, or aptitude — there are extensive literatures devoted to how precisely we can measure their shadow on the real world, such as by observing people’s choices. Since the highly influential work of Poole and Rosenthal, 1991, Poole and Rosenthal, 2000, numerous scholars have used spatial voting models to estimate ideological preferences from roll-call votes and other choice data; these measures and models are often vital to significant swaths of the literature in American and comparative politics (Imai et al., 2016) and law.

A widely used method to measure these latent traits is the item response theory model (IRT). These models, borrowing from literature developed in education testing and psychometrics, are a relatively fast way of fitting a latent trait model on diverse sets of choice data that arise from different choice environments. IRT methods have been applied to study: public opinion formation (Treier and Sunshine Hillygus, 2009; Tausanovitch and Warshaw, 2014), the ideology of actors in political institutions (Clinton et al., 2004; Martin and Quinn, 2002; Bonica, 2014), aggregation of expert rating models (Clinton and Lapinski, 2006; Treier and Jackman, 2008; Linzer and Staton, 2012) among many others.

Many of these approaches, however, disregard an important output of the IRT model that is vital to understanding the model’s fit. Because of this disregard, IRT models are often used in a way that violates the fundamental assumptions of the IRT models and the estimates of latent traits. In this paper, we argue the main limitation to existing IRT models used in service of ideal point estimation in judicial politics is a misspecification of item parameters.1 We will show that the choice of test questions to measure unobserved aptitude, the purpose of which IRT was created, may not be a good analogy to voting and ideology. In particular, we will show that we need to pay better attention to the ‘item’ part of the IRT models if we are to limit (or eliminate) bias in our measures.

In the sections that follow, we present and discuss the exact formulation of IRT models, how they are used in political science and legal scholarship, and the development of the Martin and Quinn’s (2002) Dynamic Ideal Point Model, the most widely used ideal point model in the study of judicial politics and law. We then demonstrate the bias in their estimates and correct these estimates for the problems we identify. We then follow this by replicating two recent studies of judicial behavior using our new judicial ideology scores. We then show that the specification and construction of judicial ideology scores do indeed impact empirical models of judicial behavior. We follow this with our reservations about assuming that judicial ideology is unidimensional and we suggest improvements that can be made within the political science latent scale modeling literature more broadly.

Section snippets

IRT by the numbers

Item response theory methods are a class of latent trait models developed in education testing to capture test taker aptitude along with an assessment of a questionnaire (i.e., test or exam) (van der Linden and Hambleton, 2013). Item response models provide the basis for most modern standardized tests and represent improvements made to the education testing literature from classical test theory over the last seven decades. The broad adoption of IRT models in psychometrics and education testing

Martin and Quinn (2002)

The primary work we address is one of the most essential IRT-based ideal point models for political science and law: the Dynamic Ideal Point Model by Martin and Quinn (2002). For this paper, we focus on the Dynamic Ideal Point Model for two reasons: first, it is widely cited and has provided the basis for much of the modern study of judicial behavior. The second reason is that it is a very believable model that makes many reasonable assumptions. Thus, while we focus our attention on the ideal

Applications to judicial politics research

Suppose the estimates of any particular justice’s ideology are incorrect. In this scenario, the results of empirical studies within judicial politics (particularly studies that use these scores as cardinal-level unidimensional independent variables in their models) might be misestimated. We first highlight differences between the median justices identified by the original MQ scores and our recoded MQ Scores. Next, we use our re-estimated MQ scores to replicate a well-known paper that relies on

Discussion and conclusion

As mentioned above, IRT models are a widely used methodology to measure the latent traits in political science and legal scholarship. These models were borrowed from the education testing and psychometrics literature. We believe that scholars have inappropriately ignored assumptions of IRT models in the process of applying them to judicial politics. Specifically, we focused on the violation of the uniformity of direction in the discrimination parameter (α). We cannot compare across items if

Author statement

Joshua Lerner: Conceptualization, Methodology, Validation, Formal Analysis, Writing-Original Draft, Writing - Review & Editing.

Mathew McCubbins: Conceptualization, Methodology, Supervision, Writing-Original Draft.

Kristen Renberg: Conceptualization, Validation, Formal Analysis, Writing-Original Draft, Writing - Review & Editing, Visualization.

References (40)

  • John H. Aldrich et al.

    Polarization and ideology: Partisan sources of low dimensionality in scaled roll call analyses

    Political Anal.

    (2014)
  • Frank B. Baker

    The Basics of Item Response Theory. For Full Text

    (2001)
  • Frank Baker et al.

    Item Response Theory: Parameter Estimation Techniques

    (2004)
  • Frank B. Baker et al.

    The Basics of Item Response Theory Using R

    (2017)
  • Brandon L. Bartels

    The constraining capacity of legal doctrine on the US Supreme Court

    Am. Polit. Sci. Rev.

    (2009)
  • Adam Bonica

    Mapping the ideological marketplace

    Am. J. Pol. Sci.

    (2014)
  • Chris W. Bonneau et al.

    Agenda control, the median justice, and the majority opinion on the US Supreme Court

    Am. J. Pol. Sci.

    (2007)
  • Cliff Carrubba et al.

    Who controls the content of Supreme Court opinions?

    Am. J. Pol. Sci.

    (2012)
  • Tom S. Clark et al.

    Locating Supreme Court opinions in doctrine space

    Am. J. Pol. Sci.

    (2010)
  • Joshua D. Clinton et al.

    Measuring legislative accomplishment, 1877–1994

    Am. J. Pol. Sci.

    (2006)
  • Joshua Clinton et al.

    The statistical analysis of roll call data

    Am. Polit. Sci. Rev.

    (2004)
  • Paul De Boeck et al.

    A framework for item response models

    Explanatory Item Response Models

    (2004)
  • Lee Epstein et al.

    Measuring issue salience

    Am. J. Pol. Sci.

    (2000)
  • Lee Epstein et al.

    The judicial common space

    J. Law Econ. Organ.

    (2007)
  • Lee Epstein et al.

    Ideological drift among supreme court justices: Who, when, and how important

    Nw. UL Rev.

    (2007)
  • Joshua B. Fischman et al.

    The second dimension of the Supreme Court

    Wm. & Mary L. Rev.

    (2015)
  • Jean-Paul Fox

    Bayesian Item Response Modeling: Theory and Applications

    (2010)
  • Matthew E.K. Hall

    The semiconstrained court: public opinion, the separation of powers, and the US Supreme Court’s fear of nonimplementation

    Am. J. Pol. Sci.

    (2014)
  • Daniel E. Ho et al.

    How not to lie with judicial votes: misconceptions, measurement, and models

    Calif. Law Rev.

    (2010)
  • Kosuke Imai et al.

    Fast estimation of ideal points with massive data

    Am. Polit. Sci. Rev.

    (2016)
  • Cited by (0)

    View full text