The American Psychological Association’s (APA) ability to achieve its mission “to promote the advancement, communication, and application of psychological science and knowledge to benefit society and improve lives” (APA, 2022, Mission section) has been called into question by recent critiques born from long-standing issues. The first set of critiques derived from psychology’s “replication crisis” (e.g., Open Science Collaboration, 2015), which has highlighted the field’s history of questionable research practices (Flake & Fried, 2020) via analyses revealing cracks in the evidential foundation of some of the field’s most treasured findings (e.g., the ego-depletion effect; Hagger et al., 2016). The second set of critiques involve psychology’s past and continued contributions to systemic inequalities, racism, racial discrimination, and epistemic oppression that have maintained white supremacy and created “inherent flaws in research” (Buchanan et al., 2021, p. 1098). These methodological and racist malpractices undermine psychologists’ aspirations to address societal challenges such as misinformation in a post-truth world (Barzilai & Chinn, 2020), vaccine acceptance (Murphy et al., 2021), conceptions of climate change (Thacker et al., 2020), and educational equity and excellence (Jagers et al., 2019). Reform movements have included calls for a more open science with greater methodological rigor (Fleming et al., 2021; Gehlbach & Robinson, 2021; Munafò et al., 2017) and a commitment to anti-racist and inclusive scholarship and practices (APA, 2021; Buchanan et al., 2021; Zusho & Kumar, 2018). Recently, these critically important calls have been echoed by a somewhat less prominent, but nonetheless complementary, call for reform in the ways psychologists engage in theory development (Eronen & Bringmann, 2021; Oberauer & Lewandowsky, 2019; Plaut, 2010; Wentzel, 2021).

Theories are descriptions of how, and for some types of theory why, phenomena occur in the world (Borsboom et al., 2021). Theory, and how it is developed, directly relate to psychology’s methodological and anti-racist goals because theory guides who and what psychologists study, the questions psychologists ask, how psychologists interpret findings, and what implications psychologists suggest based on those findings (Eronen & Bringmann, 2021; Szollsi & Donkin, 2021). Yet, theory development has been relatively underscrutinized compared to the field’s focus on methods (Fielder, 2017; Wentzel, 2021). Scholars who have looked deeply at psychology’s use of theory have lamented the paucity of clear descriptions of theory development (Eronen & Bringmann, 2021), the scarcity of clear connections between theory and the empirical work published in psychology journals, (McPhetres et al., 2021), the difficulty in evaluating the adequacy of theories (Gervais, 2021), the often-unchecked proliferation of multiple seemingly similar theories about the same phenomena (Eronen & Romeihn, 2020; Mischel, 2008), and the inequitable prominence of WEIRD (i.e., Western, educated, industrialized, rich, and democratic; Henrich et al., 2010) and able-ist (Emery & Anderman, 2020) perspectives in psychology scholarship. Quite simply, more just, effective, reliable, and beneficial psychology scholarship and practice requires reform focused not just on methods and anti-racist practices, but also on theory development and use.

The need for tripartite reform (i.e., methods, anti-racism, theory development) in psychology extends to its many subdisciplines (Eronen & Bringmann, 2021), with educational psychology being no exception (DeCuir-Gunby & Schutz, 2014; Makel et al., 2021; Wentzel, 2021). There is emerging evidence of questionable research practices in education and educational psychology (Gehlbach & Robinson, 2021) as well as continued evidence the field must reckon with past failures to address epistemic injustice (Kidd et al., 2017) based on race (Lopez, 2022; Usher, 2018) and ability (Emery & Anderman, 2020), among other factors. Calls for a greater focus on theory development in educational psychology have been less prominent but nonetheless present. For example, after publishing a critique of theory development in psychology writ large, Gigerenzer (2010) reported receiving this response from a professor: “No field within psychology suffers more from the theoretical malaise than mine, educational psychology” (p. 734). More recently, Wentzel (2021) argued “an important challenge for research reform is to achieve greater parsimony across theoretical frameworks” (p. 166) and that efforts at methodological reform in educational psychology “also will require changes in how we train the next generation of researchers to incorporate theory into sampling, measurement, design, and data analysis decisions” (p. 169).

Educational psychologists have indeed struggled to achieve parsimony among an ever-increasing number of theories of learning and related phenomena. For example, attempts to integrate the many prevailing theories of learning (e.g., Alexander et al., 2009) have been met with significant skepticism (e.g., Graesser, 2009; Säljö, 2009) and sadly little traction. There have been similar attempts to achieve parsimony among theories of motivation (Murphy & Alexander, 2000; Pintrich, 2003), self-regulated learning (Winne, 1995), and literacy (Anderson et al., 1985). Each attempt has been earnest, yet each has been followed by the publication of yet another attempt at synthesis years later (Frankel et al., 2017; Hattie et al., 2020; Linnenbrink-Garcia et al., 2016; Panadero, 2017), suggesting little progress toward integration or parsimony. Without effective integration, a field can experience unchecked and likely unsustainable growth in the number of theories, amusingly characterized by Mischel (2008): “psychologists treat other peoples’ theories like toothbrushes – no self-respecting person wants to use anyone else’s” (n.d.). Excessive growth in the number of theories can hinder educational psychology’s ability to generate reliable findings with practical implications, jeopardizing its reputation with practitioners, policy-makers, and the public (Gehlbach & Robinson, 2021; Wentzel, 2021; Willingham, 2017).

I argue that educational psychology would benefit from increased focus on the scholarship of theory development (e.g., Borsboom et al., 2021; Eronen & Bringmann, 2021), which would bolster the subdiscipline’s efforts to productively integrate multiple theories in ways that afford clearer and better-supported implications for practitioners, policy-makers, and the public. At the same time, the broader field of psychology’s pursuit of tripartite reform in methods, anti-racist practices, and theory development can be productively informed by educational psychologists’ efforts in these areas. Therefore, in this paper, first I synthesize the current scholarship on theory development with the goal of promoting similar efforts in educational psychology. Then, I detail how the field of psychology’s tripartite reform efforts can be enhanced by incorporating ideas and lessons learned from educational psychologists’ embrace of qualitative and mixed research methods (Meyer & Schutz, 2020; Reber, 2016), its inchoate but growing efforts to reckon with racism and epistemic injustice in scholarship (DeCuir-Gunby & Schutz, 2014; Kidd et al., 2017; Lopez, 2022; Usher, 2018), and its ongoing efforts to reconceptualize both the methodology and the consequences of measurement (Cizek, 2020; Flake, 2021). Finally, I outline several productive directions for future research and practice, including ways to promote a more transparent and hospitable scholarly climate for theory development as well as advocacy for more professional training in theory development, which is equally as important as methodological training (K. Gray, 2017; Wentzel, 2021).

What Is Theory and How Is It Developed, Evaluated, and Integrated?

By describing how, and in some cases why, phenomena occur in the world, theories productively organize empirical literature, differentiating promising from not-so-promising directions for future research that would otherwise be difficult to discern without theory (McPhetres et al., 2021; Szollsi & Donkin, 2021). Theories have two predominant aspects: description and explanation, with individual theories varying in their emphasis on each (McGann & Speelman, 2020). People tend to think first of theory’s explanatory aspects, which depict why phenomena in the world manifest (Borsboom et al., 2021). Explanatory aspects can describe relations among phenomena (e.g., how knowledge and confidence relate to one another; Dunning, 2011), afford predictions about phenomena (e.g., fostering self-efficacy increases academic performance; Chemers et al., 2001), and in some cases, suggest ways to exert influence or control over those phenomena (e.g., providing digital self-regulated learning support to students struggling in active learning courses improves their academic performance; Bernacki et al., 2020).

The descriptive aspects of theory are less frequently discussed than the explanatory ones (McGann & Speelman, 2020). Descriptive aspects of a theory do not depict why phenomena in the world manifest, rather their purpose is to provide accurate, useful, and comprehensive portrayals of those phenomena. The descriptive aspects of Bronfenbrenner’s (1979) ecological systems theory made a significant contribution to the field by expanding, elucidating, and organizing a wide array of previously under-considered phenomena relevant to child development (e.g., mesosystems, macrosystems). Descriptive aspects of theories are particularly useful when they elucidate less-commonly-known phenomena or aspects of those phenomena. For example, Schraw and colleagues’ (2007) research elaborated prior theories of academic procrastination by describing an adaptive type as well as the conditions posited to manifest both adaptive and maladaptive procrastination. More recently, Skinner and colleagues’ (2022) expanded conceptualizations of bioecological models by detailing new varieties and consequences of collective mesosystem effects.

Psychologists have tended to enunciate, investigate, and disseminate the explanatory aspects of theories much more than the descriptive ones, which is unfortunate given that each serves a unique role in theory development, and each informs the other (Berkman & Wilson, 2021). In fact, observations of phenomena “in the wild,” followed by systematic organization and descriptions of those phenomena, typically precede the creation of a theory’s explanatory aspects (McGann & Speelman, 2021; Teo, 2020). Piaget’s (1964) cognitive constructivist theory began with his observations (e.g., children’s incorrect answers on tests, his children’s behaviors) from which he inferred phenomena (e.g., phases of cognitive development). Only after producing those descriptions did he develop explanations for how and why those phenomena manifest (e.g., equilibration). Likewise, effective descriptions can organize and thus better define disparate observations and empirical findings, revealing gaps in understanding, surfacing non-intuitive effects or outcomes, and prompting the generation of new ideas or hypotheses about the world and how it works (Borsboom et al., 2021; K. Gray, 2017; Teo, 2020). For example, Bandura’s (2001) observations and eventual descriptions of how people could acquire and enact new behaviors without direct reinforcement were not easily explained using existing behaviorist theories. Behaviorism’s inability to explain these behaviors prompted Bandura to develop a new theory that could, thus leading to social learning theory.

Theories can be developed and refined using any and all research methods. The explanatory aspects of theory are often associated with quantitative research, but qualitative and mixed methods research can also contribute to explanations (e.g., grounded theory; Charmaz, 2014). Likewise, descriptive aspects of theories often arise from qualitative research, but quantitative findings also can elucidate new or unexplored phenomena (e.g., systematic underperformance on achievement tests in the presence of salient negative stereotypes; Spencer et al., 2016). Meta-analyses can reveal unexplained heterogeneity in findings, which in turn can be explored via qualitative research, with this mixed methods approach leading to better and more elaborated descriptive aspects of theory (Linden & Honekopp, 2021; Patall, 2021). In sum, novel insights about the descriptive or explanatory aspects of a theory can arise from any and all types of research designs, and such insights can and should be used to both develop and refine theory.

Theory Development

There is no one way to develop accurate and useful theory but a typical process can be depicted (see Fig. 1). This process involves generation of theory’s descriptive aspects and a related cycle where its explanatory aspects are posited and evaluated. As depicted in the top of Fig. 1, findings related to the descriptive and explanatory aspects of theory inform one another via epistemic iteration, ideally leading to a theory where both aspects mutually support one another. Generating descriptive aspects of theory and evaluating the explanatory aspects of theory both involve nuanced, often long-term cycles, warranting a separate depiction in Fig. 2. Whether scholars are conscious of the process or not, most typically theory development begins with the observation of relevant phenomena, often called “natural history” in the natural sciences (Eronen & Bringmann, 2021). The circle on the left side of Fig. 2 depicts how observations are organized and generalized into phenomena, which in turn lead to the descriptive aspects of theory. For example, Perry (1970) observed that some college students excelled at understanding the relativism taught in higher education whereas other students struggled. These observations prompted systematic research that revealed consistent phenomena (i.e., types of thinking common across students) that, in turn, led Perry to describe a model of intellectual and ethical development. This model inspired many other researchers to investigate similar phenomena and develop their own theories, evolving into the field of epistemic cognition (Greene et al., 2016).

Fig. 1
figure 1

Theory development process

Fig. 2
figure 2

Theory development cycles

Once the descriptive aspects of a theory have been developed, scholars use abduction (i.e., inferencing to what a person thinks is the best explanation) to develop explanations that relate phenomena, in causal and/or non-causal ways, and/or guide interventions upon those phenomena (see the circle on the right side of Fig. 2). Evaluations of the explanatory aspects of a theory involve testing and refining posited mechanisms and relations among phenomena, such as research on how knowledge revision occurs via competing activation mechanisms (Butterfuss & Kendeou, 2021; Schroeder & Kucera, 2022). The results of such research can support or refute some or all aspects of the explanation, sometimes leading to changes in either the explanatory or descriptive aspects of the theory, or both. This theory revision process is a kind of epistemic iteration (see Fig. 1; Eronen & Romeijn, 2020; Haig, 2013; Irvine, 2021), such as when empirical findings (Bennett et al., 2008) forced reconsideration of Prensky’s (2001) description of students as “digital natives.” It is also possible for explanatory findings to generate new descriptive aspects of theory, such as when discovery of an unanticipated relationship leads to new thinking about what phenomena are important and relevant (e.g., forms of academic procrastination, Schraw et al., 2007).

Of course, there are often many scholars developing their own theories in parallel processes of theory development. This parallel theory development process is depicted in the bottom of Fig. 1, using multiple pairs of circles, each its own single theory development process as depicted at the top of Fig. 1 and in more detail in Fig. 2. As an example of parallel theory development processes, despite not occurring concurrently, both Vygotsky (1997) and Piaget (1964) developed theories of cognitive development, which in the bottom of Fig. 1 would be depicted as theories #1 and #2, respectively. The existence of numerous competing theories can make it difficult for practitioners, policy-makers, and even scholars to determine which theory or theories are most useful (Alexander et al., 2009; Willingham, 2017). Therefore, at some point, multiple instances of theory development must be checked by theory evaluation and integration.

Theory Evaluation and Integration

Compared to the natural sciences, in the social sciences, there is less press to drive toward finding a single, universally accepted theory, but there is some consensus that unchecked proliferation of more and more theories is suboptimal, particularly for practitioners and policy-makers looking for guidance from the scholarly literature (Alexander et al., 2012; Borsboom et al., 2021; Teo, 2020). At some point, theory evaluation and integration should occur, where the field elevates the most just, effective, reliable, and beneficial theory, or aspects of theories, and sets the others aside. In essence, psychologists can let a thousand theory flowers bloom, but at some point, they need to decide which are the prettiest.

Theory evaluation and integration are two of the least discussed aspect of the theory development process (Scheel et al., 2021). It can be difficult to consider abandoning some aspects or all of a theory (Gervais, 2021). T. S. Kuhn (1962) argued theories are not disproven as often as they simply fade away when their adherents retire from scholarship and the next generation of scholars take up a different theory. Theory fadeout is not a rigorous path toward a cumulative science (Teo, 2020); more thoughtful and systematic methods of theory evaluation and integration are needed. Such methods include the use of criteria (Fidler et al., 2018) to decide which theories, or which aspects of many theories, are worthy of continued use (i.e., the bottom of Fig. 1), always subject to additional scrutiny via further theory development (Borsboom et al., 2021; Teo, 2020).

Whether consciously aware or not, many scholars have internalized a set of criteria, or characteristics of desirable or useful theories (Fidler et al., 2018), which, of course, vary from scholar to scholar and within and across academic disciplines and subdisciplines. These criteria can be conceptualized as a set of virtues: all are valuable but, depending upon context, certain ones may carry more weight than others. Table 1 is a list of such criteria, spanning numerous perspectives. It is important to note that most of these criteria refer to the phenomena that the theory targets. This further illustrates the importance of the descriptive aspects of theory, including the collection of a wide and diverse array of observations, and then carefully generalizing them into phenomena. It is difficult to effectively evaluate a theory when its descriptive aspects are underspecified, incomplete, or biased. For a theory with robust descriptive and explanatory aspects, the more criteria it satisfies, the more viable a candidate it is for becoming normative in the field. In some cases, scholars can examine multiple theories and integrate aspects of each into a new theory, using the criteria in Table 1 as a guide. Returning to the field of epistemic cognition as an example, in the late twentieth century there were a multitude of competing theories, which would be depicted in Fig. 1 as theories #1 through n. Hofer and Pintrich (1997) wrote a synthetic review that evaluated and integrated these theories, identifying similarities and differences in the phenomena and mechanisms posited, ultimately using a variety of criteria to produce a single “normative” theory that comprised some, but not all, of the aspects of the reviewed theories, depicted at the bottom of Fig. 1.

Table 1 Criteria used to select and integrate theory

The virtues used to evaluate and integrate theory can be grouped into those that focus on criteria internal to the theory and those that address how the theory relates to external theories or other factors (see Table 1). There are natural tensions among criteria used to evaluate and integrate theory that preclude using them as a simple checklist. For example, among the internal criteria, scope may not be perfectly, linearly, negatively related to parsimony, but the greater the scope the more likely it is that the theory’s complexity must increase to maintain accuracy (O’Doherty, 2021). Likewise, at earlier stages of theory development, certain criteria can take on greater importance than later ones, such as in the case of plausibility, which can eventually decrease in importance as evidence of testability accumulates (Scheel et al., 2021). Testability and mechanism are, by definition, not relevant virtues for the descriptive aspects of theories but are critical for the explanatory aspects, particularly those involving causal claims. Again, testability does not necessarily imply quantitative methods, as qualitative metrics such as transferability can provide evidence of testability (Maxwell, 2021). Specificity is an important criterion because it precludes “weak” theories with explanations that can survive multiple, contrary sets of findings (Szollosi & Donkin, 2021).

There are fewer external than internal criteria, but the former are as important as the latter. External consistency and analogy can serve as useful tools when generating theory, and they serve a role in ongoing evaluation of the theory. For example, it is virtuous that work on self-regulated strategy development for writing (Harris & Graham, 1993) coheres with research in special education and teacher professional development (McKeown et al., 2019). Consistency and analogy are also important external criteria, with literature reviews and meta-analyses serving as prominent ways to explore and share evidence relevant to these criteria (e.g., Graham, 2022). On the other hand, practicality has not received its due attention in psychological theory development and evaluation (Berkman & Wilson, 2021; Giner-Sorolla, 2019). Theories should be based on problems and interests that exist in the real world and theories should imply actionable steps toward understanding or addressing them. For example, Barzilai and Chinn (2018) have demonstrated how the AIR model of epistemic cognition can be translated into actionable goals for education, which is evidence in support of the practicality of the model. Often, theory development, including phenomena generation and evaluation of explanatory claims, can be improved by involving practitioners and other applied professionals (Berkman & Wilson, 2021; Jackson, 2021), something educational psychologist do often, but perhaps not often enough (Reber, 2016).

Criteria that serve as virtues in theory evaluation and integration must be complimented by discussions of what theory is not (Gigerenzer, 2010). For example, explanatory theories are not useful when the posited mechanism is merely a restatement of the phenomena itself (e.g., defining the process of “self-regulation” as regulation). Similarly, creating a new term to serve as a proxy for theory is not useful. Describing critical thinking as “advanced thought” does little beyond replacing one term with another. Gigerenzer (2010) also critiqued theories defined by lists of dichotomies, such as describing some kinds of cognitive processes as “automatic” and others “deliberate” (Evans, 2019). A list of what is and is not a particular type of thinking may capture phenomena but does little to explain how they emerge, why they are or should be differentiated, etc. Graphical representations of theories where every construct connects to every other construct, with no posited directionality or strength, also do little to explicate phenomena. Such representations score poorly on the criteria of parsimony (i.e., every construct connects to every other construct), mechanism (i.e., vague relations are posited, rather than specific directionality or strengths of connections), testability (i.e., many phenomena are trivially correlated with no practical import), and practicality (i.e., the theory provides no insight on where to begin affecting the system). Careful and rigorous theory evaluation and integration, using criteria such as those in Table 1, are more likely to result in normative theories that are just, effective, reliable, and beneficial (see bottom of Fig. 1).

Summary of Theory Development Scholarship in Psychology

In sum, normative theory development, evaluation, and integration processes can be articulated (e.g., Borsboom et al., 2021), which in turn enhance theory’s descriptive and explanatory aspects as well as its practical import (Eronen & Bringmann, 2021; Haig, 2005, 2013; Scheel et al., 2021). However, theory and theory development processes are often not sufficiently taught to early career scholars (Borsboom et al., 2021; K. Gray, 2017), making it relatively unsurprising that the field of psychology suffers from “weak” theory (Kellen et al., 2021; Meehl, 1978) that fails to correct, and in some cases amplifies, questionable research practices (Eronen & Bringmann, 2021; Fielder, 2017), leading to crises of replication (Open Science Collaboration, 2015), oppression (Buchanan et al., 2021), and impact (Oberauer & Lewandowsky, 2019). Educational psychology, as a subdiscipline of psychology, experiences many of these challenges and would benefit from more careful and rigorous theory scholarship (Gehlbach & Robinson, 2021; Wentzel, 2021). At the same time, educational psychologists have done important work that can inform how the field of psychology, as a whole, addresses its tripartite crises, via theory development.

What Can Educational Psychology Scholarship Contribute to Theory Development in Psychology?

The “theory crisis” in psychology extends to every subdiscipline of the field (Oberauer & Lewandowsky, 2019). Yet, educational psychology scholarship and contributions have been less than prominent in discussions of theory in psychology writ large, such as in the special issue of Perspectives on Psychological Science in 2021 or in the journal Theory & Psychology. This is surprising given the subdiscipline of educational psychology has much to contribute to the theory discourse in psychology, including ways to (1) build the descriptive aspects of theory (Scheel et al., 2021; van Rooji & Baggio, 2021) that embrace anti-racism and race-focused scholarship (Buchanan et al., 2021; Teo, 2020) and (2) conceptualize measurement and its consequences (Flake & Fried, 2020) in ways that can guide practical decision-making. These foci have great promise for addressing the tripartite crises facing educational psychology and the entire field of psychology, as well.

Building Descriptive Aspects of Theory

Observations are a crucial aspect of theory generation. Without sufficient, and sufficiently diverse, observations, researchers may underspecify or even fail to identify important phenomena. Imagine how ecological systems theory might have been underspecified if Bronfenbrenner (1979) had never observed how families interact with schools and thus had omitted mesosystem interactions (Skinner et al., 2022). Phenomena also usefully constrain the explanatory aspect of theory (Eronen & Bringmann, 2021; McGann & Speelman, 2020), providing the empirical grounding with which theory must cohere. For these reasons, it is critical that psychology engage in sufficient “field work” to collect a comprehensive, diverse set of observations and then conduct rigorous organization and inference to identify reliable phenomena (Jachimowicz, 2022). In fact, scholars have argued that one reason for the “theory crisis” is that psychologists do not engage in sufficient, and sufficiently diverse, observation, leaving the descriptive aspects of their theories incomplete and the explanatory aspects insufficiently tested (McGann & Speelman, 2020; Scheel et al., 2021). Engaging in sufficient amounts of observation requires renewed focus on the “natural history” of psychology, which Eronen and Bringmann (2021) analogized to the five years Darwin spent collecting and cataloging butterflies before developing his theory of evolution by natural selection. Insufficiently diverse samples, which increase the likelihood of the descriptive aspects of theory being underspecified, were identified as one of psychology’s many problems in the APA’s apology for its history of racism. The subdiscipline of educational psychology can serve as a model of how to renew psychology’s focus on observation and diverse samples, via the subdiscipline’s recent efforts to better embrace qualitative and mixed methods research (Meyer & Schutz, 2020) and race re-imaged research (DeCuir-Gunby & Schutz, 2014).

Qualitative and Mixed Methods Research

Qualitative methodologies informed the early foundations of psychology research, but then fell out of favor during much of the twentieth century before experiencing a renaissance in the late 1900s and early 2000s (Levitt et al., 2018). Yet, surprisingly, scholarship on theory development in psychology writ large has underemphasized qualitative and mixed methods research, despite scholars’ calls to expand observation and phenomenon development to create more and better-formed descriptive aspects of theory (Eronen & Bringmann, 2021; McCann & Speelman, 2020). Indeed, qualitative methods are uniquely suited for observation and the inference of phenomena that often typify the early stages of theory development (Haig, 2005), akin to Darwin’s “field work” in the natural sciences. Educational psychology’s efforts to more fully embracing qualitative and mixed methods arguably began in the early years of the twenty-first century (Meyer & Schutz, 2020). Qualitative methods have been used to abduce theory, such as via inferring a developmental progression of positions from interviews with college students (e.g., Perry, 1970). Qualitative and mixed methodologies can drive theory revision, as well, such as when Greene and Yu (2014) used interviews with middle school students and professors to investigate why survey measures of epistemic cognition had such poor evidence of psychometric validity. Arguably, in some cases in the past, theory development in educational psychology began with qualitative observations that were not named as such. For example, Zimmerman’s observations of children in classrooms watching peers led to a descriptive theory of cognitive modeling, which when combined with scholarship in the areas of metacognition and motivation, led to his model of self-regulated learning (Zimmerman, 2013). Qualitative and mixed methods research have made important contributions to theory generation and development in educational psychology, as such, these methods seem likely to make similar contributions to psychology writ large.

Educational psychology’s embrace of qualitative and mixed methods may have been facilitated by its emphasis on scholarship with practical implications for teaching and learning (Zimmerman & Schunk, 2002). A theory is practical when it has defensible, actionable implications for addressing real-world problems (Berkman & Wilson, 2021). Understanding real-world problems, and providing practical proposals for addressing them, requires working with participants and community members to gather and understand informative observations that comprehensively capture multiple aspects and nuances of the context (Jackson, 2021), practices that are ideally suited for qualitative and mixed methods research. Novel applications of theory are more feasible when scholars and practitioners understand the contexts in which that theory was developed, so that they can engage in informed modifications, as described by another promising methodology in educational psychology and the learning sciences: design-based research (Sandoval & Bell, 2004). In sum, qualitative, mixed methods, and design-based research can and should play a prominent role in the theory development process. In addition, the use of these methodologies can reveal when the exclusion of particular populations from theory generation has led to underspecified or wholly inaccurate scholarship in need of reform (DeCuir-Gunby & Schutz, 2014).

Race Re-imaged Research

Psychological research and theory have been aptly criticized as being much too heavily dependent upon samples, and thus observations, derived from people from Western, educated, industrialized, rich, and democratic (i.e., WEIRD) societies (Henrich et al., 2010). These critiques were included in APA’s (2021) apology and informed several of its resolutions, including those related to research:

APA will prioritize efforts in knowledge production and scholarship, such as those that enhance psychology’s scientific methods based on culturally diverse knowledge production, and those that create mechanisms to count and acknowledge all racial and ethnic groups in APA-sponsored research (Resolutions section)

When researchers aim to generalize to heterogeneous populations (e.g., all students in the USA), homogeneity in sampling can lead to biased or incomplete sets of observations, resulting in narrow or mis-specified phenomena, which in turn can lead to partially or wholly inaccurate theories to describe or explain those phenomena. These concerns about insufficient collection of diverse observations, and its consequences for theory, echo DeCuir-Gunby and Schutz’s (2014) critiques of the limited observations and WEIRD samples that have informed educational psychology research and theory.

These critiques led DeCuir-Gunby and Schutz (2014) to call for more race-focused research, or scholarship focused on constructs such as racial identity, highlighting educational psychology’s failure to gather sufficient observations of relevant phenomena (e.g., racial socialization) that are integral to many students’ experience. Likewise, their call for race-reimaged research, scholarship to reconceptualize constructs developed with limited samples by incorporating more diverse and sociohistorical perspectives, reflects a recognition that limited observations of phenomena can lead to theories that, in some cases, must be rebuilt “from the ground up” to be applicable to a more diverse group of people (e.g., theories of motivation; D. Gray et al., 2018). As an example of the value of race-reimaged empirical research, Fong and colleagues (2019) found that the construct of belongingness (i.e., being liked, respected, and valued; Goodenow, 1993) did not accurately represent the experience of Indigenous students in the USA. Belongingness for these students included phenomena related to cultural identity, family, community, and self-empowerment; these phenomena were absent from theories of motivation derived from WEIRD samples. Fong and colleagues’ research illustrated the importance of engaging in more, and more diverse, “phenomena detection” (Eronen & Bringmann, 2021, p. 6) before theorizing, lest the resulting theory be under- or mis-specified.

To truly address historical, systemic racism in psychology per the APA’s resolutions, race-focused and race re-imaged research must be valued and lifted up across the psychology scholarly literature (e.g., DeCuir-Gunby & Schutz, 2014; Matthews & Lopez, 2020; Zusho & Kumar, 2018). Ignorance of the experience of racial minorities or outright rejection of those experiences are forms of epistemic injustice (i.e., systemic exclusion of voices during the pursuit of knowledge; Fricker, 2007; Kidd et al., 2017) that, given their effects upon these people and the world writ large, requires no further argument to eliminate from the field. Racism is a prominent form of injustice in the USA and many, but not all, other countries; in addition, there are other forms of epistemic injustice that result in unwanted and damaging homogeneity in sampling and theorizing, such as psychology’s failure to adequately address humanity’s cultural and nationality diversity (Gervais, 2021). In sum, epistemic injustice can result in the omission of important observations that can and should inform the inference of phenomena, with subsequent damage to the theory developed to describe and explain those phenomena. Omission of important observations can also affect how phenomena are measured and interpreted.

The Ethics of Measurement

In psychology, there has been insufficient attention to methodological transparency and ethical issues in measurement (Flake & Fried, 2020). Scholars frequently fail to adequately define constructs or sufficiently detail how to measure them (Kellen et al., 2021), which in turn make it difficult for other scholars to replicate or build upon that work (Eronen & Romeijn, 2020; Grahek et al., 2021; Smaldino, 2019). Likewise, instrument development studies often involve homogeneous, WEIRD samples, calling into question their generalizability outside of those samples. Flake (2021) has articulated the need for more rigorous and transparent measurement research, otherwise empirical findings that contradict established theory (e.g., failure to replicate) could be explained away by claiming differences in sample or administration procedures, or appeals to unstated or unclear assumptions (e.g., a posited effect failed to replicate due to incorrect order of instrument administration; Kellen et al., 2021). Reforms in how measures are developed, disseminated, and tested are all important, but another area of reform that has received even less attention concerns the ethics of measurement.

APA’s (2021) apology includes past measurement atrocities committed in the name of psychology and its advancement: “WHEREAS eugenicists focused on the measurement of intelligence, health, and capability, concepts which were adopted by the field of psychology and used systemically to create the ideology of White supremacy and harm communities of color” (About this Resolution section). The apology also rejected “hegemonic science” (APA, 2021, Resolutions section). The apology clearly indicates a need for thoughtful, ethical, and benevolent approach to measurement, among other scholarly practices, but lacks detail on how to enact such an approach.

One of many necessary ways to reject hegemonic science and its deplorable consequences for communities of color is to reprioritize a neglected aspect of validity in measurement research. Debates about validity, including what comprises it and how to establish it, are prominent in the measurement literature (Messick, 1989). However, those debates have been particularly heated regarding whether “consequential validity” (i.e., the effects upon people or society that result from test use) should be included as a type of validity (e.g., Popham, 1997). Given psychology’s history of support for eugenics, systemic racism, and oppression, it seems important to incorporate consequences into any consideration of measurement validity, somehow. Cizek’s (2020) reconceptualization of validity provides a path forward that psychology, writ large, would benefit from considering. He differentiates efforts to gather evidence to support claims that inferences from a score indicate what was intended (i.e., score meaning) from efforts to gather evidence to support claims that the score can justifiably be used in a particular manner (i.e., test use; Cizek, 2016). This differentiation clarifies that there may be acceptable justification for an interpretation of scores, but that nonetheless the use of those scores for a particular purpose may still be unjustified, thus invalidating their use. A familiar example is the SAT, where psychometricians may assert ample evidence defending the construct validity evidence in favor of subscale scores as indicators of underlying verbal or mathematical ability (i.e., score meaning; Cizek, 2016). Such evidence is necessary if those scores are used to inform college admissions decisions, but that evidence is not nearly sufficient to make and justify decisions of such import. The question of whether SAT scores should be used when making college admissions decisions depends upon a host of practical, ethical, and societal factors, far beyond questions of psychometric validity. In essence, separating questions about the quality of measurement (e.g., “Does the SAT adequately measure mathematical ability?”) from questions about the consequences of its use (e.g., “Are SAT scores an appropriate factor in college admissions decisions given the societal disparities in resources available to fund preparation for the SAT?”) affords more informed and specific focus on each, thus allowing psychologists to reject eugenics and hegemonic science due to their consequences alone, rather than having to also debate other aspects of validity theory and evidence. Such differentiation and reprioritization of consequential validity is necessary in an era when psychology seeks to make measurement, and its consequences, more rigorous, transparent, ethical, and benevolent (APA, 2021; Flake & Fried, 2020).

Summary of Educational Psychology’s Contributions to Theory Development Scholarship

Educational psychologists have done admirable work regarding how to conceptualize measurement in ways that clarify and privilege ethics (Cizek, 2020), rigor, and transparency (Flake, 2021), and also regarding how to build better descriptive aspects of theory via qualitative, mixed, and race-focused and re-reimaged scholarship (DeCuir-Gunby & Schutz, 2014; Meyer & Schutz, 2020) that would likely enhance theory reform in psychology writ large. Such theory reform is a necessary companion to the method and ethical reform necessary to build a psychology that is truly just, effective, reliable, and beneficial. However, once again, it would be incorrect to state that educational psychology has completely and fully embraced methods for expanding descriptive theory and measurement in rigorous and ethical ways. Nor is it the case that educational psychology has adequately or sufficiently addressed methodological and ethical concerns about its past, present, or future. There is yet much more reform work to do, within psychology writ large and its subdisciplines.

Future Directions for Enhancing Theory Development in Psychology

There is a “theory crisis” in psychology (Oberauer & Lewandowsky, 2019) that, along with the field’s methodological and ethical crises, must be addressed for the field to achieve its goals of benefitting society and improving lives. Formal apologies (APA, 2021), special issues (Eronen & Bringmann, 2021), and ongoing critique (Teo, 2020) are all important steps toward necessary reform in psychology. The subdisciplines of psychology, including educational psychology, must be a part of this reform movement. That requires translating methodological, ethical, and theory development lessons learned across these subdisciplines (e.g., Gehlbach & Robinson, 2021) and communicating effective ideas and practices back to the field writ large. In this paper, I have argued that educational psychologists and the theories they use would benefit from greater attention to the scholarship of theory development (McPhetres et al., 2021). Likewise, educational psychologists have much to contribute to that scholarship, including ways to build better descriptive aspects of theory and conceptualizations of measurement validity that properly clarify and prioritize the consequences of test use. These contributions reveal future directions for reform and innovation as well, particularly around acknowledging and combatting ableism in psychology scholarship, creating a hospitable climate for theory revision, increasing transparency in theory development, and more deeply incorporating theory development into psychology’s preparation programs.

Ableism

The APA’s (2021) apology to people of color included a number of commitments for action as well as resolutions for potential future actions. One of those resolutions included potential actions to address long-standing oppression based on disability identity. Psychology scholarship involving disabilities has a history of being compartmentalized from the rest of the field (Brown et al., 2011), which can lead to the marginalization of people who identify as having one or more disabilities. Such compartmentalization and marginalization are unjust on their face, and in addition, the exclusion of people with disabilities from psychology scholarship is yet another example of homogeneity and exclusion in sampling that can have negative effects upon the quality, accuracy, and utility of theory (Giner-Sorolla, 2019). Emery and Anderman (2020) have made similarly relevant critiques of educational psychology’s failure to collect sufficiently diverse and numerate observations for theory generation, including insufficient attention to the experience of students with disabilities. Clearly, psychologists should pursue disability-focused and disability re-imaged scholarship, as educational psychologists have begun to do around race (Matthews & Lopez, 2020; Zusho & Kumar, 2018). As is the case for research focused on race (DeCuir-Gunby & Schutz, 2014), the results of disability-focused and disability re-imaged research may necessitate theories in educational psychology be revised “from the ground up” to more fully incorporate views, experiences, and phenomena uncovered from field work with students with disabilities.

Creating a Hospitable Climate for Theory Revision

The process of epistemic iteration, and in particular instances where the descriptive or explanatory aspects of a theory are justifiably set aside, can feel threatening (Gervais, 2021), particularly when calls for dismissal come from others (e.g., Hagger et al., 2016). Nonetheless, the revision and even the dismissal of theory are normal and important aspects of how scholarly disciplines like psychology advance, and as such it is important to create a scholarship climate where epistemic iteration is treated as normal and even an expected part of the field’s ongoing theory development process. Deciding that the weight of evidence warrants setting aside a theory is indeed a scholarly advance, and it should be honored as such, with no shame to those who first asserted the theory. Such climates decrease the likelihood of epistemic injustice (Kidd et al., 2017), particularly for groups historically underrepresented in the literature who rightly question the applicability of narrow theory to their experience.

Educational psychologists would be wise to model a more accepting scholarly climate, which would necessarily include disseminating scholarship that illustrates epistemic iteration. Importantly, such iteration requires well-elaborated explanatory and descriptive aspects of theory, particularly given that the former can usefully circumscribe and vet the latter (Haig, 2005, 2013; Scheel et al., 2021). Scholars should be encouraged to publish evidence that the descriptive or explanatory aspects of their theories require revision, which necessarily requires that scholarship venues (e.g., journals, conferences) accept and disseminate well-justified theoretical critiques (Zengilowski et al., 2021) and statistically non-significant findings that elucidate theory (Greene et al., 2022), avoiding the “file-drawer” problem (Fleming et al., 2021) that can pervert quantitative, qualitative, and mixed methods research. Further, there is a need for salient and explicit conversations about the criteria scholars could and should use to evaluate theory and move toward productive integration (Fidler et al., 2018). A healthy climate for theory revision is likely to lead to more productive and transparent theory development.

Transparency in Theory Development

Just as the replication crisis has prompted calls for greater transparency in educational psychology methods (Gehlbach & Robinson, 2021) and measurement (Flake, 2021), there is a need for greater transparency in theory development. For example, observations are crucial to the development of the descriptive aspects of theory. Without sufficient, and sufficiently diverse, observations, researchers will struggle to infer a comprehensive sense of the relevant phenomena, which in turn may constrain the creation and vetting of the explanatory aspects of theories (Fielder, 2017). For these reasons, it is critical that psychologists engage in sufficient field work to collect these observations and then sufficient reflection to deeply understand and characterize the phenomena they indicate. Yet, psychology does not have a strong history of supporting field work or “natural history” research, such as what is quite typical and valued in the natural sciences (McGann & Speelman, 2020; Scheel et al., 2021). There must be changes in the incentive structure for gathering observations, inferring phenomena, and developing the descriptive aspects of theory (Jachimowicz, 2022). Journals should more explicitly welcome scholarship focused on these aspects of the theory generation. Likewise, scholars who evaluate their colleagues’ contributions (e.g., external reviews for promotion or tenure) should not look askance at rigorous, thoughtful, and comprehensive collection and organization of observations. Well-vetted empirical testing of theory is certainly valuable, but the field should also recognize the critical role of the descriptive aspects of theory in the pursuit of knowledge.

In essence, there is a need for greater transparency in the entire “lifespan” of a theory, from details regarding how and why phenomena and the descriptive aspects of the theory were inferred, to the various ways explanatory aspects were developed and tested, through the various epistemic iterations that occur in the process of further refining the descriptive and explanatory aspects of theory, through to the social process of theory integration. Each phase of theory development can and should be implemented rigorously and then reported transparently, over short- and long-periods of time. Reporting on the lifespan of a theory, or theories, may require a new type of article: a theory review. Literature review articles tend to identify a theory and then synthesize empirical work related to that theory (Boote & Beile, 2005). Review articles certainly can include, organize, and even integrate theories (e.g., Hofer & Pintrich, 1997), but such reviews are rare. Developing a specific type of review article, detailing the generation, iteration, and development of theory, or multiple related theories, would be a useful way to document and disseminate the evolution of an area of scholarship while also identifying decisions made, paths not taken, and ideas worth reconsidering. Further, having clear records of the theory development process would enable scholars-in-training to trace the history of a theory and therefore better understand how to use, critique, and potentially improve that theory (Gigerenzer, 2010).

Incorporating Theory Generation, Evaluation, and Development Into Training Programs

The replication crisis has led to calls for the incorporation of Open Science methodologies and sensibilities into scholarly training programs (van der Zee & Reich, 2018). Such calls must be matched with similar movements to include instruction on theory, as well (K. Gray, 2017). In times of shrinking budgets and students who understandably wish to keep their professional training, which often requires an underpaid apprenticeship, as short as possible, it can be challenging to find room for even more methodological and theory training. Nonetheless, just, effective, reliable, and beneficial psychological contributions to society require rigorous methods guided by well-developed theory that result in actionable and practical implications (Meehl, 1990). A healthy psychological science requires deep knowledge of and skills in theory generation, evaluation, and integration, which clearly connect to and require similar depth in terms of methods. Efficiencies may be realized when theory and method topics, which are often taught separately, are instead presented as they are enacted in practice: in dynamic interaction with one another, allowing for iteration to better scholarly outcomes.

Conclusion

The American Psychological Association (2022) is not yet adequately living its mission to advance and apply psychological science and knowledge to benefit society, and it is fair to say that the field must enact significant changes to improve the quality, reliability, and morality of scholarship and practice. Concerns about the quality of scholarship in psychology date to well before the replication “crisis” (e.g., Meehl, 1978) but have been heightened since that crisis highlighted the questionable reproducibility of some of the most accepted psychological findings. The field’s foci on methodology (Gehlbach & Robinson, 2021; Open Science Collaboration, 2015) and ethics (APA, 2021) are most welcome but must be buttressed with a focus on improving theory development, as well (Eronen & Bringmann, 2021). The subdiscipline of educational psychology faces the same challenges as the field writ large, thus there is a clear need for greater focus on how theory is developed, how to disseminate it effectively, and how to properly value such work. In addition, educational psychology’s efforts to adopt a diverse set of research methods to better understand and describe phenomena and better account for the consequences of testing serve as lessons that the rest of psychology would be well to learn. Greater transparency and communication about theory development, within and across the many subdisciplines of psychology, is needed to realize the field’s benefits to society and properly prepare the next generation of scholars and practitioners who wish to do the same, and more.