Introduction

Large language models (LLMs) and their recently increased accessibility via chatbots like ChatGPT (OpenAI, 2023), Bard (Google, 2023), or Bing Chat (Microsoft, 2023) provide both new opportunities and challenges for education. On the one hand, they legitimately promise effective ways to assist with many tasks involved in both teaching and learning (Bernabei et al., 2023; Kohnke et al., 2023), to provide scalable, personalized learning material (Abd-alrazaq et al., 2023; Sallam, 2023), and thus easy and scalable opportunities for exercise (Kasneci et al., 2023). On the other hand, they come with the educational challenge to avoid becoming overly or naïvely reliant on their support (Abd-alrazaq et al., 2023; Bernabei et al., 2023; Kasneci et al., 2023; Kohnke et al., 2023; Shue et al., 2023; Zhu et al., 2023), and thus to prevent inadvertently adopting inherent biases (Abd-alrazaq et al., 2023; Bernabei et al., 2023; Dwivedi et al., 2023; Kasneci et al., 2023; Zhu et al., 2023) or losing out on opportunities for reflection and practice for developing domain expertise and judgment competence (Dwivedi et al., 2023; Krügel et al., 2023). These are, however, especially needed for the responsible use of present LLMs, because, due to their inherent random mechanisms utilized during text generation (Wolfram, 2023), mistakes or fabricated information cannot be entirely ruled out. Hence, at least for the time being, the output generated by LLMs definitely requires domain expertise for critical revision and evaluation. Education thus finds itself currently faced with the challenge to find an appropriate balance between seizing new and welcome opportunities and protecting against inadvertent risks of losing out on the development of required expertise at the same time.

In this perspective piece, we propose that—in a first step—a more exploratory, playful approach towards the use of LLMs may help with finding such an appropriate balance. Such an approach has already been utilized in the form of prompt engineering in various domains (Oppenlaender et al., 2023; Polak & Morgan, 2023; Short & Short, 2023; Shue et al., 2023; Wang et al., 2023; White et al., 2023; Zhu et al., 2023). Beyond those accounts, we further suggest that—in a second step—going the full way to a game-based education could eventually provide a new pedagogy of learning with artificial intelligence (AI) leveraging the full potential within a well-balanced cooperation between human and machine intelligence. We further argue that this second step allows utilization of LLMs for devising appropriate game-based learning environments, such that LLMs may eventually serve to overcome exactly those challenges they pose for education in the first place.

To serve a systematic development of our arguments, the article is organized as follows: first, we briefly illustrate both opportunities and challenges posed by the usage of LLMs in educational contexts. In a second section, we argue how a more playful approach to the usage of LLMs in education may already help to resolve some of the tension between opportunities, challenges, and risks. In a final section, we outline our proposition how game-based learning can extend the limits of said playful approaches, paving the way for a prolific co-operation between human and artificial intelligence in education.

LLMs in Education—Opportunities and Challenges

Generally, LLMs are a recently developed form of AI (i.e., algorithms historically devised to mimic, extend, or replace parts of human cognition or behavior). More specifically, they are a form of generative AI, representing algorithms capable of generating new media like images or text.

Recent LLMs (like those provided via ChatGPT) use large datasets of text in conjunction with artificial neural networks with billions of parameters to process and generate text. Chat-like interfaces allow the user to obtain human-like responses in conversational style upon entering arbitrary prompts. While earlier language models like Wordtune, Paperpal, or Generate (Hutson, 2022) could help writers restructure a sentence, more recent versions like ChatGPT can help with devising entire manuscripts, providing feedback, finding limitations (Zimmerman, 2023), or devising specialized text like computer code (Shue et al., 2023).

The essential core principles have, however, remained similar (Wolfram, 2023): the computation of likely continuations of the user-provided prompt based on identified relations between text elements in the vast amount of training data. An important ingredient in the computation is the fact that not always the most likely, but sometimes a less likely continuation is chosen. While this serves the impression of an especially spontaneous, human-like, fluently emergent text, it also is the reason why the information provided by present LLMs can be misleading or erroneous and thus requires continuous supervision and critical evaluation.

Opportunities

Given their capabilities, LLMs provide a wide range of opportunities for education (Kasneci et al., 2023). LLMs can assist with management tasks (e.g., development of teaching units, curricula, or personalized study plans), with assessment and evaluation, and with program monitoring and review (Abd-alrazaq et al., 2023). They can take the roles of content providers (Abd-alrazaq et al., 2023; Jeon & Lee, 2023; Sarsa et al., 2022), temporary interlocutors, teaching assistants, and evaluators (Jeon & Lee, 2023). They can assist with writing tasks of both teachers and learners (Bernabei et al., 2023), regarding not only content creation, but also basic information retrieval (Zhu et al., 2023) and literature review (Abd-alrazaq et al., 2023).

LLMs can further assist teachers in orchestrating a continuously growing plethora of teaching resources, making the teachers’ resources (bound to developing and revising learning material in earlier times) more available for designing creative, well-organized, and engaging lessons (Jeon & Lee, 2023). They enable personalized learning (Abd-alrazaq et al., 2023; Sallam, 2023) and may benefit learners’ understanding of topics (Bernabei et al., 2023; Sarsa et al., 2022; Zhu et al., 2023). If used carefully, they can enhance critical thinking and problem-based learning (Bernabei et al., 2023; Sallam, 2023; Shue et al., 2023), emphasize the role the role of students as active investigators, and raise ethical awareness regarding the use of AI (Jeon & Lee, 2023).

Challenges

However, careful use of LLMs also presents a challenge to both teachers and learners (Kasneci et al., 2023). This is related to a variety of shortcomings of LLMs that have not yet been entirely resolved. These include the possibility of mistakes or fabricated information; the lack of recent, state-of-the-art, domain knowledge; the lack of originality; inherent (social or gender) biases; various ethical and legal issues like copyright, plagiarism, and false citations; lacks of transparency and accountability; cybersecurity issues; and the risk of infodemics (Sallam, 2023; Zhu et al., 2023).

In contrast to pocket calculators, present LLMs are not designed to yield reliably the same deterministic output upon the same given prompt. A stochastic element in generation of such output is a part of how and why they work so astonishingly well in producing seemingly human-like responses (Wolfram, 2023). This, however, has also the consequence that their output definitely requires critical evaluation and careful revision by domain experts (Ali et al., 2023; Biswas, 2023; Hosseini et al., 2023; Howard et al., 2023; Kasneci et al., 2023; Mogali, 2023; Salvagno et al., 2023; Van Dis et al., 2023; Zhu et al., 2023). Especially when it is about decisions that should guide human action, the support provided by LLMs should be supervised by human expertise (Molenaar, 2021).

Expertise as a Crucial Factor in Human-AI Systems

This resonates well with the general assertion that the quality of decisions by human-AI systems depends crucially on the human expertise within such systems (Ninaus & Sailer, 2022). However, both the development and preservation of expertise require practicing domain-specific problem-solving capabilities (Elvira et al., 2017; Tynjälä, 2008; Tynjälä et al., 2006).

As novices advance from easier to more difficult problems, they continuously engage in three learning processes. First, they transform conceptual knowledge into experiential knowledge when, for instance, applying general concepts to specific problems in particular contexts. Second, they also need to explicate experiential into conceptual knowledge to, for instance, make tacit knowledge (Patterson et al., 2010) accessible to other people as well as to metacognitive processes like reflection. Reflecting on experiential and conceptual knowledge finally allows for improving problem-solving strategies, further supports the transfer of tacit to explicable knowledge, and facilitates the development of learning strategies, metacognitive, and self-regulatory skills (Elvira et al., 2017).

All three processes have in common that continuous practice in integrating conceptual, experiential, and self-reflective knowledge during problem-solving utilizes already existing expertise and contributes to its further development. Although modern theories on expertise acknowledge that many factors besides practice contribute to expertise development (Hambrick et al., 2016), they do not deny the relevance or even necessity of (deliberate) practice (Campitelli & Gobet, 2011; Ericsson et al., 1993; Hambrick et al., 2014).

Interaction Between Use of LLMs and Expertise Development

In formal education, which lays the foundations for the development of expertise, practice sometimes requires that learners engage in effortful or even strenuous tasks. That is, learners need to regulate their attention and efforts toward a task that might be associated with aversive feelings and also to resist engaging in more pleasurable activities (Kurzban et al., 2013; Miller et al., 2012).

However, the convenience and low opportunity cost that LLMs bring for certain tasks, bears the risk of over-reliance (Kasneci et al., 2023) or over-trust (Morris et al., 2023), which has also been recognized as a hindrance for critical thinking (Shue et al., 2023), learning, and reflection (Zhu et al., 2023). In addition to that, learners (and sometimes also teachers) can feel tempted by the authoritative nature of the responses to take them at face value without critically evaluating and processing them further (Kohnke et al., 2023). Lastly, learners can be tempted to outsource the activity. While such outsourcing might be appropriate for tasks that are merely means to an end, it becomes problematic when tasks represent essential learning opportunities for skills that a person should have even without AI support (Salomon et al., 1991). Over-reliance on LLMs in educational contexts is thus associated with some risk of losing out on essential ingredients for the development and preservation of expertise, potentially and inadvertently providing also a risk of deskilling (Morris et al., 2023), and consequentially of automation bias, reduced human autonomy and judgment competence (Dwivedi et al., 2023; Deutscher Ethikrat, 2023).

It is important, however, to note that an eventual shift in what is considered an essential skill is not problematic per se. As with every new useful tool, LLMs also bring about a shift in what is considered essential expertise. While in earlier times, doing a statistical analysis might have involved manually integrating a normal curve to determine a p-value, this would hardly suggest that a social scientist not knowing anymore how to do this has not developed any statistical expertise (we thank the anonymous reviewer for providing this example). The advent of the digital computer has changed the outline of the skill set determining the meaning of statistical expertise.

Ongoing developments of generative AI technology like retrieval augmented generation, improving on both factual reliability and timeliness of responses provided by LLMs (Gao et al., 2023), are likely to push the boundaries of what kind of expertise may be called essential even further. The critical point remains that high-quality decisions of human-AI systems presuppose some human expertise (Ninaus & Sailer, 2022). And it is difficult to judge in advance which skill sets will remain essential in the future. As Dwivedi et al. (2023) note, we as educators must ask ourselves first: which skills are still needed? Once these are identified, a second question remains: how can we devise new, appropriate ways of developing and practicing these skills in a new pedagogy of learning with AI?

Banning LLMs?

One response addressing this challenge are calls for more closely regulating the use of LLMs, ranging from simply requiring disclosure (Stokel-Walker, 2023) over adaption of examination procedures (Dwivedi et al., 2023) to complete bans (Johnson, 2023; Rosenzweig-Ziff, 2023). Yet attempts at external control face at least one very pragmatic issue: It can be difficult, if possible at all, to distinguish between human- and AI-produced material (Ariyaratne et al., 2023; Dunn et al., 2023; Else, 2023). Although tools are developed that allow (at least temporarily) AI-support detection to some extent (Bernabei et al., 2023; Else, 2023), we also think that research and higher education needs to devise ways to use LLMs ethically, transparently, and responsibly (Van Dis et al., 2023). Furthermore, “it makes no sense to ban the technology for students that will live in a world where this technology will play a major role” (Dwivedi et al., 2023, p. 9).

A completely different response to the outlined challenge originates long before the most recent advent of AI in the form of LLMs. It involves a more playful stance towards the new possibilities that come with new technology.

On Playful Approaches to Integrate New Technology in Education

As early as in the 1960s, Papert (1980) developed a pedagogical approach which allowed to utilize computers to facilitate children’s understanding of geometry. However, instead of thinking of ways to use computers just as providers of more sophisticated, digital teaching or learning material, children were enabled to build up their geometrical understanding by providing them with a tool to let computers do something meaningful to them. For this purpose, the programing language Logo was developed (Papert, 1980) which allowed children to control the movement of a virtual turtle which left behind lines as it moved over the screen. By understanding how to draw geometrical shapes by controlling the turtle, and further, how simple geometrical shapes constitute more complex images, a gradually improving understanding of geometry allowed the children to draw more beautiful and complex images. Playful experimentation with the Logo language allowed to build up experiential knowledge by applying basic, conceptual knowledge of how to draw squares, triangles, and so forth. At the same time, purposeful drawing of more complex, composite objects (like a house with a door, windows, and a roof) required translating experiential knowledge into conceptual knowledge by the necessity to provide specific commands. Learning by purposive doing and by engaging in discovery via the natural processes of trial and error would further provide ample opportunity to reflect on both, experiential and conceptual knowledge to further improve drawing capabilities and thus, understanding geometry. Papert’s pedagogical approach (1980), hence, naturally nurtured all three learning processes involved in developing expertise (Elvira et al., 2017; Tynjälä, 2008; Tynjälä et al., 2006). Not only became children able to produce images and experiences of meaning for themselves, but they did so just inasmuch as they improved in their geometrical understanding, programing capabilities, and computational thinking. Furthermore, new technology, i.e., the digital computer, which could have just been programmed to do the same geometrical operations much more efficiently, was instead utilized to promote education (Papert, 1980).

Yet, why did Papert come up with his playful, constructionist approach to learning in the first place? In fact, he was inspired by constructivist theory of how children construct new schemas by interacting with their environment (Piaget, 1962). In Piaget’s theory of cognitive development (1962), play facilitates children’s cognitive development by activating basic units for organizing knowledge and behavior, known as schemas. Play allows both the practice of existing schemas, and thus of existing skills and knowledge, and the development of new ones by combining elements of existing ones in ways that transcend existing knowledge.

Social development theory (Vygotsky, 1967), scrutinizing also the developmental importance of play, adds the notion that the crucial point of play for learning is its capability to provide children with opportunities to explore outcomes beyond their current abilities. Play allows players to experience and simulate potential outcomes without the real-life costs (Homer et al., 2020). It allows to probe their capabilities, and by that, it allows them to grow beyond their current limitations. Although highlighting somewhat different aspects, both theories of play highlight their potential for facilitating learning and development.

More recently, research within self-determination theory (SDT; Ryan & Deci, 2017) has specifically highlighted the importance of intrinsic motivation, the enjoyment of the activity itself, as critical to learning across development (Reeve, 2023). That is, much if not most of human learning (both within and outside formal education) occurs because of our interest and curiosity in activities, from which we acquire knowledge and skills. Research in SDT suggests that sustained playful learning involves experiencing a sense of autonomy and competence, which are often richly afforded within game environments (Rigby & Ryan, 2011).

Carefully applying these concepts to the challenge posed by LLMs for expertise development may turn the outlined risks into promising learning opportunities. The idea is the same as the one exemplified by Papert’s approach (1980) to utilize computers as educational tools. Instead of seeing LLMs as possibilities to outsource task accomplishment, they are understood as tools that can be utilized to engage in a meaningful activity. The interface, which has been the Logo language in Papert’s case (1980), now is, for instance, ChatGPT, allowing to provide prompts that steer the underlying LLM in the desired direction. In this case, the meaningful product, is not necessarily an image, but can be a manuscript, some computer code, or any piece of text. The specific expertise required to be acquired to make LLMs work in such a useful way has become known as prompt engineering.

Prompt Engineering as a Form of Playful Interaction with LLMs

Prompt engineering generally refers to the iterative process in which users fine-tune their textual inputs to achieve a desired output from the LLM (Meskó, 2023). It has been recognized as an essential competence within future digital literacy (Eager & Brunton, 2023; Korzynski et al., 2023), eventually enabling to fully harness LLMs’ potential to provide personalized learning, unlimited practice opportunities, and interactive engagement with immediate feedback (Heston & Khun, 2023). It has been successfully applied in diverse domains including software development (White et al., 2023), entrepreneurship (Short & Short, 2023), art (Oppenlaender et al., 2023), science (Polak & Morgan, 2023), and healthcare (Wang et al., 2023).

Prompt engineering may involve role play or persona modeling (letting the LLM adopt a specific role such as a domain expert in a certain field; Short & Short, 2023), text format, style or tone (Zhu et al., 2023), length and (coding) language restrictions (Shue et al., 2023), question refinement or alternative approaches requests, flipped interaction patterns (e.g., requesting questions rather than elaboration from the LLM; White et al., 2023), chain-of-thought-prompting (generating intermediate outputs; e.g., “Take a deep breath and work on this problem step-by-step”; Yang et al., 2023), or emotional prompting (e.g., “This is very important for my career”; Li et al., 2023) among many more possible techniques. Noteworthy, identified functional prompt patterns have been found to be generalizable over many different domains (White et al., 2023).

Although optimizing prompts has been shown to be capable of vastly improving the accuracy of outputs generated by LLMs (Li et al., 2023; Yang et al., 2023), the fact remains that the critical evaluation of resulting outputs still requires domain expertise. Critically reviewing the resulting output is just as important as optimizing the prompts (Shue et al., 2023).

Prompt engineering itself can actually be regarded as an expert skill requiring not only expertise within the domain (for the selection of appropriate keywords and prompt content) but also of prompt modifiers and the training data and system configuration settings of the specific LLMs (Oppenlaender et al., 2023). Becoming proficient in prompt engineering thus has an analogous meaning for a user of an LLM as becoming proficient in the Logo language for Papert’s (1980) students. It not only allows one to make use of LLMs efficiently, but in order for it to work, i.e., to result in reliable and useful output, it entails practicing exactly that domain expertise which it presupposes. Given the necessary expertise, prompt engineering can thus become a form of playful interaction with LLMs, exploring various aspects of a topic by varying prompt patterns and techniques. Under those circumstances, the domain expert’s intrinsic interest in the reliability and usefulness of results produced in cooperation with LLMs might provide some protection from over-reliance on a single output and associated risks of more narrowly directed LLM employments.

However, such risks might be more severe for learners who are not yet domain experts but are presently on their way to developing such expertise. Their primary goals may be less intrinsically motivated but eventually correspond rather to the mere accomplishment of educational tasks like the submission of seminar papers, homework, or sample calculations. In light of the especially low opportunity costs of LLMs, supporting a playful approach for working with them also under those circumstances may require more than to appeal to individual integrity and virtue. Such support, however, may then be accomplished by providing a learning environment in which playing becomes a natural form of activity (Plass et al., 2020) and a designed pathway to learning. That means, such support may be provided by a pedagogy of learning based on games.

Game-Based Learning as a Way to Harness the Full Potential of Human-AI Interaction in Education

Games, in both non-digital and digital forms, have repeatedly proven valuable for learning, training, and education (Dillon et al., 2017; Pahor et al., 2022; Pasqualotto et al., 2022). They provide space for playful learning experiences, allow room for experimentation, and provide safe spaces for graceful failure, a crucial component for learning with games, allowing the players to learn from mistakes and motivating them to practice until feeling confident (Plass et al., 2015).

Due to their capabilities in capturing and holding people’s attention and in fostering sustained engagement and long-term loyalties, games have further become role models for engaging learners (Rigby, 2014) and citizens to solve complex scientific problems (Cooper et al., 2010; Spiers et al., 2023). Well-designed games can indeed promote both the required persistence in activities for practice and high quality of engagement that can foster deep human learning and problem solving (Barz et al., 2023; Hu et al., 2022; Ryan & Rigby, 2020). The extension of SDT (Ryan & Deci, 2000, 2017) based on research on video games (Ryan et al., 2006), technology design (Calvo & Peters, 2014), or digital learning (Sørebø et al., 2009) has shown in which ways psychological satisfactions for autonomy, competence, and relatedness can be evoked or undermined and thus affect players’ intrinsic motivation and sustained engagement (Ryan & Rigby, 2020). In games, a complex set of skills is challenged in a constrained environment in which those skills can be explored, analyzed, manipulated, extended (Ryan & Rigby, 2020), or in other words: practiced. Thereby, ample opportunities allow experiences of autonomy, competence, and relatedness fuelling intrinsic motivation. “In a well-designed game, the learning becomes its own reward” (Ryan & Rigby, 2020, p. 169).

The problem-based gaming model (Kiili, 2007) further emphasizes the meaning of experiential learning and reflection in educational games. It is argued that the ability to reflect may be the main factor determining who learns effectively from experience (Kiili, 2007). This is especially true for games that require problem-solving (e.g., simulation games). In the model, the level of reflection concerns whether the player considers the consequences of their actions and the changes in the game world to create better playing strategies (double-loop learning) or merely applies the previously formed playing strategy (single-loop learning). Games that trigger double-loop learning are effective because they persuade players to test different kinds of hypotheses and consider the learning content deeply from several perspectives. The challenge of educational game design is to design game mechanics that trigger such meaningful reflection practices.

Games as a Culture Medium for the Development of Expertise

Games naturally serve all three learning processes facilitating the development of expertise. By providing ample space for playful engagement, they support the transformation of experiential into conceptual knowledge. By being—in contrast to free-form play—yet structured by explicit rule sets and specific goals (Deterding et al., 2011), they also require and thus facilitate the transformation of conceptual into experiential knowledge. Finally, as outlined above, they invite diverse forms of reflection serving the further development of problem-solving strategies as well as metacognitive and self-regulatory skills.

The capabilities of games to invite reflection are further emphasized by the fact that successful games have repeatedly been identified as sources of spontaneously emergent culture. Affinity groups (Gee, 2005) may emerge (online or offline) in which players meet to communicate, reflect, and influence game rules, extend new game content, and contribute to game development (Brown, 2016), engage in theorycrafting (Choontanom & Nardi, 2012) and peer-to-peer apprenticeship (Steinkuehler & Oh, 2012). Both the explication of experiential knowledge into conceptual knowledge and reflecting on both knowledge types happen naturally in such spontaneously forming collaborative spaces.

The emergence of those spaces is not induced by top-down mechanisms (e.g., by game developers) but happens horizontally within the game community (Steinkuehler & Tsaasan, 2020). For instance, in the Just Press Play project (Decker & Lawley, 2013), investigating the effect of gamification on undergraduate experience in computer science, students spontaneously requested access to computer labs for tutoring other students for free, on their own time and out of their own desire. In addition, a lively community of educators emerged, constantly creating new learning environments and trying to include the game in the class room against all technical and bureaucratic odds. After the release of Minecraft, communities emerged, modifying the game and creating content way beyond the games’ original intended meaning and functionalities (Nebel et al., 2016). Users—and mostly pupils—used the games’ mechanics to create functioning CPUs, landscapes of their favorite books or sustainable environments, all in their free time. Those are both unforeseen and astonishing results. Not only provide they examples of what the notion of “learning outcomes” in game-based learning can actually encompass: the spontaneous emergence of teachers or experts from a community of students or novices (Steinkuehler & Tsaasan, 2020). They also provide examples of what potential game-based learning might bear for education.

Furthermore, they provide examples of how games can foster spontaneous profound engagement with the learning material far beyond a mere accomplishment of tasks. When within well-designed games, in which the basic needs of autonomy, competence, and relatedness are met, learning becomes its own reward (Ryan & Rigby, 2020), the option to outsource cognitive efforts to LLMs becomes less tempting. Instead, well-designed games might even foster the motivation to utilize LLMs for engaging deeper with the content and finding out more. That means, game environments might provide novices with a flavor of that kind of intrinsic interest that may protect domain experts from over-reliance and associated risks.

Yet Where Are All the Educational Games?

However, if games hold such an educational potential, the question needs to be addressed: Why have they not become much more abundant in schools and universities? One simple reason is that making good games, i.e., games that satisfy basic psychological needs (Ryan & Rigby, 2020), is tough. Even established developers in the entertainment game industry, i.e., in the business of manufacturing fun, repeatedly fail to deliver and are regularly hit with closures and layoffs (Hodent, 2018), whereas some of the most successful games started as low-budget side projects. Educational games face many additional challenges.

On a socio-cultural dimension (Fernández-Manjón et al., 2015), an issue is social rejection of games, which may be reduced by improving society’s understanding of games as another form of cultural good, and informing stakeholders (students, educators, and parents) about the social potential and positive effects of video games (Granic et al., 2014) and their usefulness in education (Bourgonjon et al., 2010). At the same time, violence, sexism, and discrimination are advised to be avoided in the design of educational games (Fernández-Manjón et al., 2015).

Along an educational dimension, limited accessibility to educational games can prevent their further adoption in education (Fernández-Manjón et al., 2015). Whereas creating and maintaining user manuals and best practice guides are ways to facilitate accessibility (Fernández-Manjón et al., 2015), both require further structural support. The latter can be provided by simultaneous support and creation of communities of practice (Wenger, 1998) allowing participation in development processes (Moreno-Ger et al., 2008) and knowledge production and transfer between educators, developers, and researchers (Fernández-Manjón et al., 2015; Hébert et al., 2021).

Along a technological dimension, limited accessibility to technology is an issue (Hébert et al., 2021). Lowering development costs and developing environments that allow educators some game development without requiring substantial programming skills and specific game development expertise are regarded as necessary steps to address this issue (Fernández-Manjón et al., 2015).

LLMs as an Opportunity for Harnessing the Potential of Games Within Education

In this context, LLMs or more generally generative AI tools have the potential to transform game-based learning practices and—again similarly to the use of computers in Papert’s class (1980)—could even become once more part of their own remedy regarding the challenge they pose for education. This, however, warrants a new pedagogy of learning with artificial intelligence.

In particular, we identified two use scenarios in which generative AI tools can boost the use of games in educational settings. First, generative AI tools provide new ways to implement making games for learning approaches (Kafai & Burke, 2015), in which students learn educational content by designing and making games. Second, teachers and educators can utilize AI tools to gamify their learning materials or even create fully-fledged learning games for their students. In the following, we consider how LLMs can be utilized in these scenarios.

Learning by Generating Games

Making games for learning is another prime example of a constructionist learning activity (Kafai & Burke, 2015) similar to Papert’s (1980) early use of computers in the classroom discussed above. Kafai and Burke (2015) argue that we are witnessing a paradigmatic shift toward constructionist gaming, in which students design games for learning instead of just consuming games created by professional developers.

We believe that generative AI tools will further accelerate this shift. LLMs have the potential to make game creation more accessible for novices in a similar way as block-based visual programming environments like Scratch (Resnick et al., 2009) lowered the demands to program interactive stories and animations in educational settings. The pedagogical idea behind learning by generating games relies mostly on the assumption that game-making activities help students reformulate their understanding of the subject matter (educational content) and express their personal understanding and ideas about the subject (Kafai, 2006). In addition, generating games using AI’s technical backup can be open and creative, allowing for experiences of autonomy and competence essential to sustained interest and intrinsic motivation (Ryan & Rigby, 2020). As the technicalities of programming can be largely outsourced to LLMs, students can focus more on the topic and game design aspects.

A recent study indicates that game-designing activities can be even more beneficial, especially for the long-term retention of knowledge, than learning by playing games (Chen & Liu, 2023). Furthermore, Resnick et al., (2009) have emphasized that digital fluency requires more than just interacting with media; it requires an ability to collaboratively design, create, and invent with media. Similar abilities are needed when creating games with the help of LLMs and seem now more important than ever. However, making games with LLMs also imposes unique requirements for students as well as for teachers who are orchestrating the game-making activities.

We coined the term prompting pedagogy to capture fundamental pedagogical practices involved in generating games or other digital outputs with the help of LLMs going beyond prompt engineering as discussed above and constituting one aspect of a new pedagogy of learning with AI. While prompt engineering will be a crucial competence for harnessing the potential of AI in education (Eager & Brunton, 2023), we also want to emphasize that the ability to critically evaluate generated outputs and its facilitation by existing (domain) expertise are equally important (Dwivedi et al., 2023; Krügel et al., 2023). This critical evaluation informs the crafting of prompts leading to meaningful and constructive dialogue with LLMs. Such cumulative and continuous dialog is crucial when using LLMs in complex tasks like game-making. Moreover, using LLMs in such a reflective and critical manner enhances critical thinking and problem-based learning (Bernabei et al., 2023; Sallam, 2023; Shue et al., 2023).

It is evident that effective prompting is challenging, and students need support to develop adequate prompting skills to generate games with LLMs. Prompting pedagogy for game-making also involves the preparation of support materials (e.g., prompting templates for different purposes) and sequencing the prompting activities to specific phases (e.g., idea generation, core design, prototyping, and assessment).

Even though the use of LLMs plays a crucial role in the suggested learning by generating games approach, the design and production activities need to be integrated into a meaningful teaching process. For example, the creative thinking spiral process (imagine, create, play, share, reflect, imagine, and so forth) can be adapted to the learning by generating games approach (Resnick, 2009). According to Resnick (2009, p. 1), in this process, “people imagine what they want to do, create a project based on their ideas, play with their creations, share their ideas and creations with others, and reflect on their experiences—all of which leads them to imagine new ideas and new projects.” This thus provides a way to emphasize playtesting with peers (sharing and testing game prototypes and games) as well as reflective discussion sessions about prompting and game design strategies in LLMs-based game-making projects.

Overall, learning by generating games promotes a creative, experimental, playful, and inclusive learning culture that aims to support the learning of academic content while preparing students for utilizing generative AI tools effectively and creatively in different contexts. As teachers have a significant role in this approach, a starting point for them may be to generate at least one learning game with an LLM before applying the learning approach in their teaching. Such first-hand experience can facilitate perceiving the affordances that LLMs provide, preparing support materials for students, and planning the workflow of activities.

The use of generative AI for developing learning games may also help to decrease the barriers to the creation of low-budget game productions and educational games. One problem with educational games is that we have become more and more accustomed to big-budget releases. Many educational games seem degraded by comparison (e.g., poor graphics and mechanics) and are thus perceived as boring or unappealing. A reasonable utilization of LLMs for game development could eventually help to close this gap. Moreover, since the activities are learner generated, they may well engender a different kind of interest and sense of ownership than studio produced educational outputs.

Gamifying Learning Materials

Generative AI provides many low-threshold possibilities for educators to gamify their teaching or to generate learning games for teaching. That is, LLMs and generative AI might establish themselves as a useful tool for developing (educational) games, for instance, by supporting the generation of artwork, code, or game levels (Nasir & Togelius, 2023; Todd et al., 2023). LLMs can further assist educators in the analysis, design, evaluation, and development phases of game creation projects, allowing, for instance, the adaption of popular board games such as Monopoly for specific learning purposes (Gatti Junior et al., 2023).

It may eventually not matter whether the game makers are students or educators; the generation of games with LLMs requires a playful, experimental, and iterative style of engagement in which game makers continually reassess their goals, explore new solutions, and imagine new possibilities based on the generated outputs and dialogue with LLMs. Resnick and Rosenbaum (Resnick & Rosenbaum, 2013) called such a bottom-up approach “tinkering.” As highlighted above, one of the key skills of the successful generation of games with LLMs is prompting and critical evaluation of generated outputs, which requires expertise such tinkering might enhance.

As game development is usually a highly interdisciplinary process requiring expertise in various areas, LLMs might be used to complement individuals’ skills in a particular area. For instance, it might allow an educator with expertise in the pedagogical approach for a given problem and an idea for the game design to implement a working prototype of an educational game, which would have been significantly more difficult for the educator without using generative AI technologies. Furthermore, as game design is a very complex activity, it is important to break complex prompts into a series of small, specific steps and phases, starting from the idea generation and identification of instructional approaches and core game mechanics. For example, chain-of-thought prompting (generating intermediate outputs) or role prompting (giving the LLM, e.g., the specific role of an instructional designer or target group player) can increase the model’s contextual tenability and enhance the quality of outputs.

Conclusions and General Remarks

On balance, implementing insights from game-based learning in educational contexts is far from a straightforward task. However, game-based learning research has revealed that well-designed games indeed address, challenge, and promote players holistically, incorporating all cognitive (Mayer, 2020), affective (Loderer et al., 2020), motivational (Ryan & Rigby, 2020), and sociocultural (Steinkuehler & Tsaasan, 2020) aspects of the human condition. Applications of game-based learning in science, technology, engineering, and mathematics (Klopfer & Thompson, 2020), or the development of educational games for critical thinking (Butcher et al., 2017) or social problem-solving (Ang et al., 2017) indicate at least the potential games may have for fostering deep engagement with the learning material and continuous practice of expertise. Utilizing LLMs for learning by generating games and purposefully gamifying learning materials may allow educators to fully harness the potential of games toward a new pedagogy of learning with AI.

The potential of playful and game-based learning we see for education is strongly related to games’ motivational and engaging power, that “in play, the aim is play itself” (Flanagan, 2009). Even if the activities associated with the playful engagement encountered in games could be delegated to AI support, who would want this—because in this context, it would be outsourcing the fun and intrinsic satisfactions of play. That would be like delegating joy to a robot. Even if we could, why should we want that?

The notion of (good) practice has since Aristotle (2020) involved the aspect of bearing its meaning in itself, a quality which practice has, according to Rousseau and Schiller (Greipl et al., 2020), in common with play. It seems as if the advent of AI challenges us as educators to remember and revive research and its teaching, as such practice calls for the creation and cultivation of playful spaces within education. While this perspective is certainly not about advocating that we redesign each class into a game promising enjoyment or entertainment, we think that game-based learning could be especially valuable in taking advantage of the educational capabilities of AI, which themselves require capable human partnership.