1 Introduction

Personality—“a person’s nature or disposition; the qualities that give one’s character individuality”Footnote 1—is a key area of research in user modelling and user adaptive systems. One of the most popular ways to describe and measure personality is trait theory—where a person is assessed against one or more factors (e.g. ‘Conscientiousness’ or ‘Agreeableness’). These measurable differences in how people interact with the world are prime targets for providing users with an appropriately tailored user experience. However, to facilitate these tailored user experiences, researchers first need to discover which aspects of personality are important for adaptation, and how to tailor experience to them.Footnote 2

One approach would be to measure users’ personality and ask them to use the system or evaluate its features. However, as noted in Paramythis et al.’s (2010) discussion on layered evaluation, one issue with using a user-based study for an adaptive system is that adaptation takes time, often more than is available during a study. One solution they advocate is an indirect study, where the user model is given to participants and they perform the task on behalf of a third party. This allows researchers to control the characteristics of the imaginary user, avoiding the time delay needed for populating the user model from actual user interactions with the system. An indirect study also ensures that the input to an adaptation layer is perfect, making it very suitable for layered evaluations. Indirect studies may also be required for other reasons—for example, they are needed when it is difficult to recruit a large enough number of target participants, such as in the work by Smith et al. (2016) for skin cancer patients.

Another way to investigate adaptation strategies and discover pertinent personality traits is by using a User-as-wizard approach (Masthoff 2006; Paramythis et al. 2010), which uses human behaviour to inspire the algorithms needed in an adaptive system. In a User-as-Wizard study, participants are given the same information the system would have, and are asked to perform the system’s task. Normally, participants will deal with fictional users, which allows us to study multiple participants dealing with the same user, controlling exactly what information participants get.

When using a User-as-Wizard or indirect approach for adaptation to personality research, the simulated user’s personality needs to be conveyed. However, there is a paucity of easy, validated ways to convey or represent the personality of a third party to participants. One option is to use real people, allowing participants to interact with a person with the desired trait. However, this is hard to control as it is hard to ensure participants adapt to personality instead of, for example, current affective state. Participants would have to spend considerable time with the individual to perceive their personality. Another option is to ask participants to “imagine a user who is extravert” or provide statements such as “John is neurotic”. This approach is unlikely to elicit empathy from participants due to a lack of context about the simulated user and could possibly be overlooked when placed with other data, such as test scores.

This is a non-trivial research problem: how to provide enough information about the personality of a simulated user for participants to identify and empathise with them, without making the simulated user seem one-dimensional and implausible. This paper details a methodology for conveying personality using validated personality stories.

In addition to conveying personality, these stories can be used as part of an alternative method of measuring personality.

Reliable and efficient personality measurement is still largely an open challenge. Whilst validated personality tests exist, completing them may create an overhead that is unacceptable to users: personality tests range from the Five Item Personality Inventory (FIPI test) (Gosling et al. 2003) to the 300-item International Personality Item Pool (IPIP-NEO) (Goldberg et al. 2006). A problem with questionnaires is response bias, in particular, the bias introduced by acquiescence or ‘yea-saying’—the tendency of individuals to consistently agree with survey items regardless of their content (Jackson and Messick 1958). This is an issue with many personality trait questionnaires, and was one reason why a new version of the Big Five Inventory (BFI-2) was produced recently (Soto and John 2017). Questionnaires may also be undesirable for reasons described later. Current approaches to unobtrusively measure personality include analysis of blogs (e.g. Nowson and Oberlander 2007; Iacobelli et al. 2011), users’ social media content (e.g. Facebook, Twitter) (Gao et al. 2013; Golbeck et al. 2011; Quercia et al. 2011) or social media behaviour (e.g. Amichai-Hamburger and Vinitzky 2010; Ross et al. 2009). These indirect approaches are however still far less reliable than direct approaches.

Using the personality stories as a basis, we propose an alternative and light-weight approach for reliably measuring personality, using so-called personality sliders with the stories at the slider ends, which is faster than completing many personality tests. We describe how identification with the people in personality stories can easily and engagingly be used to measure user personality. Personality sliders provide a broad characterisation of a personality trait, whilst at the same time making it less salient to participants what they are asked about. Personality sliders take about a minute to complete per trait (assuming an average reading speed), so are fast to administer and may save time particularly:

  • In studies or systems that require a user characteristic for which short questionnaires do not yet exist. Short questionnaires only exist for some personality traits (most noticeably the Five Factor Model), whilst the slider approach can be used for any personality trait as well as other user characteristics. Of course, the personality stories are created from questionnaire items, and using more items increases reading time. However, only one decision/interaction is required per trait (compared to one per item for the questionnaires), reducing cognitive load and decision time.

  • In studies that require both the measurement of the participants’ personality and the portrayal of the personality of fictional people—e.g. looking at the impact of self-similar personality on book recommendations for fictional users. Participants only need to read the stories once, so 1 min suffices to both complete the personality test and portray two fictional users’ personality.

  • In studies or systems that require obtaining personality measurements for multiple people provided by one person. For example, in Moncur et al. (2014), automated messages about babies in intensive care to their parents’ social network were adapted to individual receivers’ characteristics. This may require a parent to indicate the emotional stability of the people closest to them. Using the personality sliders, participants only have to read the stories once, and then only need to make one decision/interaction per personality trait per person.

Another advantage of using personality sliders is that it reduces response bias. Using the personality story sliders, participants need to judge which person they resemble more, so are not agreeing/disagreeing with individual items, removing bias due to acquiescence. Multi-item surveys also tend to suffer from straight-lining. Straight-lining occurs when participants give identical (or nearly identical) responses to items in a battery of questions using the same response scale (Zhang and Conrad 2014). Requiring only one interaction per trait (as in the sliders) mitigates this. Finally, personality sliders provide a higher granularity of personality, as the sliders provide continuous rather than interval data, whilst most personality tests are restricted to a small number of points. This also means that the data is more appropriate for parametric analysis than traditional likert data.

To evidence the practical value of our methodology for conveying and measuring personality, we show how the personality stories and personality sliders have been successfully used in many of our studies (see Sect. 6).

Fig. 1
figure 1

The methodology used in this paper for personality slider development

1.1 Overview of methodology

Our methodology for conveying and measuring personality traits using personality stories (see Fig. 1) consists of the following stages:

  1. 1.

    Creating short stories about a person to express distinct personality traits (their target trait): we use Resilience, Generalized Self-Efficacy, and those from the Five Factor model.

  2. 2.

    Iteratively validating the generated stories to ensure that the stories convey their target trait at high and low levels, and are able to robustly portray the desired trait by asking people to fill out a personality questionnaire for the person in the story (different from the questionnaires used for story creation). Issues include both the case where the perceived score for a non-target trait (a personality trait other than the target trait) differs significantly between high and low story, and where the scores for these non-target traits lie outside a normative range. The pilots were conducted in the lab with later studies conducted using crowdsourcing for broader generalizability.

  3. 3.

    Validating the approach of measuring personality through stories by allowing users to pick which individual they are most like, using a slider. The values of these results were correlated with standardized personality tests for the same traits.

  4. 4.

    Outline how the slider values can be used to distinguish groups of users with distinct levels of personality traits. Before the sliders could be used in a system, or even applied experimentally to evaluate adaptation, we needed to define how to use the slider values. We summarise the advantages and disadvantages of the respective methods.

  5. 5.

    Validating the approach in an experiment where personality is likely to affect adaptation (i.e. use the stories in an experiment where you hypothesize that there ought to be an effect of personality). We tested the approach in multiple studies.

1.2 Crowd sourcing participants

We rely heavily on rapid questionnaire responses from a participant pool to iteratively validate personality stories. Where the number of unique participants required was small, we used convenience sampling. However, our participant pool was too small for Five Factor Model validation as many iterations were required (explained in Sect. 4.3). To expand our participant pool, we decided to use the crowd-sourcing service, Amazon Mechanical Turk (MT) (2012).

MT is helpful when requiring large numbers of participants for studies. However, valid concerns exist that data collected online may be of lower quality and requires robust validation methods. Many studies, such as those described by Weinberg et al. (2014) have tried to show the validity of using MT to collect research data. These studies have generally found that the quality of MT data is comparable to what would be collected from supervised lab experiments, if studies are carefully set up, explained, and controlled. We follow recommended best practice in our MT experimental design and procedures.

In our work we have obtained some insights into using crowd-sourcing to gather experimental data. We were initially concerned that crowd-sourced participants (workers) would simply complete questionnaires in a random fashion in order to be paid. However, we found no evidence for this. “Gaming the system” by random scoring did not occur: participants correctly identified the personality trait we were portraying.

MT holds statistics on each worker, including acceptance rate. This is available to all requesters (those setting tasks) representing the percentage of work submitted by a particular worker that was approved (by all requesters). Thus if somebody consistently submits poor work, their acceptance rate drops. As requesters can set a high acceptance rate as a qualification for their tasks, this causes participants to value their acceptance rate, and complete tasks conscientiously. In addition to this, the integrated Cloze Test for English Fluency (Taylor 1953) was used as an attentional check to ensure participants were carefully reading the instructions, and had enough literacy skills to understand the task. We were also able to restrict participation to the United States only, which considerably drops the possibility of spam in the results.

The paper is structured as follows. Section 2 surveys the literature on measuring, conveying and adapting to personality. Section 3 describes the story creation process. Section 4 discusses the process of story validation. In Sect. 5, we test using the stories to measure user personality and outline how these results can be applied to group users by personality trait. Section 6 shows the application of the methodology by summarising many studies that investigated adaptation to personality and used the stories to convey or measure personality. Section 7 concludes the paper, discusses its limitations and provides directions for future work.

2 Related work

In this section, we describe the models of personality used in this paper and the rationale for choosing these, focusing specifically on trait theories and social learning approaches. We summarize the methods for obtaining users’ personality traits and then summarize how personality can be portrayed, building on these methods. Finally, we discuss adaptation to personality in recommender systems, persuasive systems, and intelligent tutoring systems. We focus on adaptation to particular personality traits and the acquisition and portrayal of personality in the studies conducted.

Table 1 The five robust dimensions of personality from Fiske (1949) to present

2.1 Models of personality

2.1.1 Personality trait theories

Traits are defined as “an enduring personal characteristic that reveals itself in a particular pattern of behaviour in different situations” (Carlson et al. 2004, p. 583). Over time, trait theorists have tried to identify and categorise these traits (Carlson et al. 2004). The number of traits identified has varied, with competing theories arising. The best known include Eysenck’s three factors (Eysenck 2013), Cattell’s 16PF (Cattell 1957), and the Five-Factor Model (FFM) (Goldberg 1993). More recently a general consensus towards five main traits (or dimensions) (Digman 1990; McCrae and John 1992) has emerged, shown in Table 1 (reproduced from Digman 1990). Most psychologists consider the FFM robust (Magai and McFadden 1995), and a multi-year study found that individuals’ trait levels remained relatively stable (Soldz and Vaillant 1999). The exact names of the traits are still disputed by psychologists (Goldberg 1993; McCrae and John 1992; Digman 1990), however we adopt the common nomenclature from John and Srivastava (1999) and refer to them as:

  1. I

    Extraversion: How talkative, assertive and energetic a person is.

  2. II

    Agreeableness: How good natured, cooperative and trustful a person is.

  3. III

    Conscientiousness: How orderly, responsible and dependable a person is.

  4. IV

    Emotional Stability (ES): How calm, non-neurotic and imperturable a person is.Footnote 3

  5. V

    Openness to Experience: How intellectual, imaginative and independent-minded a person is.

2.1.2 Resilience

The FFM is the core model of personality, as it is considered to be stable (i.e. a person’s personality does not change, or changes very slowly). However, people also have traits that vary more quickly, encapsulate several core traits or are more environment/experience–dependent. One example is resilience, which is an often poorly defined term that encapsulates “the ability to bounce back from stress” (Smith et al. 2010, p. 166). Poor resilience is associated with depression (O’Rourke et al. 2010; Southwick and Charney 2012; Hjemdal et al. 2011) and anxiety (Connor and Davidson 2003; Hjemdal et al. 2011). While not as stable as the FFM traits, resilience is a medium-term trait that may be improved by interventions (Smith et al. 2010).

2.1.3 Social learning approaches

The Social Learning approach to personality “embodies the idea that both the consequences and behaviour and an individual’s beliefs about those consequences determine personality” (Carlson et al. 2004, p. 593). Whereas trait theorists argue that knowing the stable characteristics of individuals can predict behaviour in certain situations; advocates of the Social Learning approach think that the environment surrounding an individual is more important when predicting behaviours (Carlson et al. 2004). Two popular Social Learning models are Locus of Control (Rotter 1966) (LoC) and (generalized) Self-Efficacy (Bandura 1994) (GSE).

An individual’s Locus of Control represents the extent to which a person believes they can control events that affect them (Rotter 1966). A learner with an internal LoC believes that they can control their own fate, e.g. they feel responsible for the grades they achieve. A learner with external LoC believes that their fate is determined by external forces e.g. they believe that their grade is a result of the difficulty of the exam or their teaching quality. Self-Efficacy is defined as “the belief in one’s capabilities to organize and execute the courses of action require to manage prospective situations” (Bandura 1995, p. 2) and determines whether individuals will adapt their behaviour to make changes in their environment, based on an evaluation of their competency (Carlson et al. 2004). It also defines whether an individual will maintain that change in behaviour in the face of adversity; GSE has been shown to be an excellent indicator of motivation (McQuiggan et al. 2008).

2.2 Measuring personality

There are many explicit or implicit approaches for measuring personality. Explicitly, personality traits can be obtained through self-reporting questionnaires, which typically ask users to rate to what extent certain statements apply to them. Multiple versions of such questionnaires exist—for example, the Five-Factor model (FFM) is often used in research, not only because there is broad agreement between psychologists, but because many validated questionnaires exist which measure it, with varying item numbers (e.g. 5 item FIPI (Gosling et al. 2003), 10 item TIPI (Gosling et al. 2003), BFI-10 (Rammstedt and John 2007), 20-item mini-IPIP (Donnellan et al. 2006), 40-item minimarkers (Saucier 1994a), 44-item BFI (John and Srivastava 1999), 50 item IPIP-NEO-50 (Goldberg et al. 2006), 60 item NEO-FFI (McCrae and Costa 2004), 240 item IPIP-PI-R, and 300-item IPIP-NEO Goldberg et al. 2006). Questionnaires for other traits also exist (see Table 2 for questionnaires that have been used for other traits). Advantages of measuring personality from self-reporting questionnaires include the ease of administration, the existence of validated questionnaires for most traits (so, easily extended to other traits), and transparency to users. Disadvantages are that they are often time consuming (leading to problems such as straight-liningZhang and Conrad 2014) and may be inaccurate (either because respondents see themselves differently then they really are, or because they want to portray a certain image to other people).

Personality traits can be measured implicitly using machine learning techniques. Personality can be inferred from user generated content in social media, e.g. Facebook Likes (Kosinski et al. 2014; Youyou et al. 2015), language used (Park et al. 2015; Oberlander and Nowson 2006), Twitter user types (e.g. number of followers) (Quercia et al. 2011), a combination of linguistic and statistical features (e.g. puctuation, emoticons, retweets) (Celli and Rossi 2012), and structural social network properties (Bachrach et al. 2012; Quercia et al. 2012; Lepri et al. 2016). See Farnadi et al. (2016) for a comparative analysis.

Table 2 Examples of existing work on adapting to personality

Alternatively other interaction data can be used, such as measuring personality traits from gaming behaviour. For example, Cowley and Charles (2016) use features that describe game player behaviour based on the temperament theory of personality, Yee et al. (2011) measure personality from player behaviour in World of Warcraft, Wohn and Wash (2013) from spatial customisation in a city simulation game, and Koole et al. (2001) using a common resources dilemma gaming paradigm. Implicit association tests have also been used, measuring reaction times to visual stimuli associated with contrasting personality descriptors (Grumm and von Collani 2007).

Non-verbal data can also be used from speech and video, such as prosody, intonation, gaze behaviour, and gestures. For example, Polzehl (2014) details how speech features can be used. Biel and Gatica-Perez (2013) use features from video blogs such as speaking time, speaking speed, how much the person looks at the camera. Staiano et al. (2011) use speech and gaze attention features from videos of meetings. Rojas et al. (2011) use facial features.

Finally, multi modal personality recognition can also be used; for example Farnadi et al. (2014) used a combination of textual (linguistic and emotional) features extracted from transcripts of video blogs in addition to audio-video features. Similarly, Srivastava (2012) used a combination of non-verbal behaviour and lexical features.

For a more in depth review of automated personality recognition including a summary of existing studies and which personality traits were recognised see Vinciarelli and Mohammadi (2014).

Advantages of measuring personality implicitly are that it can be done unobtrusively (as long as the data used is generated naturally) and tends to have good accuracy. Disadvantages are potential privacy implications (it is important that users provide explicit consent), the need for substantial data for the underlying machine learning algorithms (so it requires time to measure the personality of new users) and the poor availability of existing datasets for other applications. Dunn et al. (2009) investigated ease of use, user satisfaction, and accuracy for three interfaces to obtain personality, one explicit one (NEO PI-R, with 240 questions) and two implicit ones (a game and an implicit association test). They concluded that an explicit way of measuring personality is better for ease of use and satisfaction.

2.3 Portraying personality

Personality can be portrayed in many ways, often inspired by the ways in which it can be measured. Firstly, participants can be shown content generated by someone who with the personality trait we want to portray, such as a blog post, audio recording, or video. This is hard to do well, as it is difficult to avoid conveying information beyond personality. For example, facial expressions (as may be present in video recordings), speech (as present in video and audio recordings), and linguistic content (as present in text and speech) provide superfluous information about affective state (Zeng et al. 2009). Video, audio and text often also implicitly provide information about the person’s ethnicity/region of origin, age, gender, and opinions (Rao and Yarowsky 2010). Additionally, it requires finding those with exactly the personality trait required, and obtaining their permission for using content they generate for this purpose.

Secondly, participants can be shown such content, but rather than using a person with a desired personality trait, the trait is portrayed by an actor, researcher or automatically generated based on what we know influences the measurement of certain personality traits. This provides more control, as an actor can be instructed to depict only one trait at the extreme, and to try to be neutral on other variables, such as affective state. Social Psychology and Medical Education commonly use actors to depict personality traits. For example, Kulik (1983) used actors to portray extraversion (actor smiled, spoke rapidly and loudly, discussed drama, reunions with friends, lively parties) and introversion (actor spoke more hesitantly, talked about his law major, lack of spare time, interest in Jazz). Barrows (1987) describes stimulated/standardized patients as presenting the gestalt of the patient being simulated including their personality. The problem remains that actors also provide information about gender, age, ethnicity. Additionally, hiring good actors may be costly.

Portraying personality is also widely investigated in the Affective Computing community, particularly by virtual agents (Calvo et al. 2015). For example, Doce et al. (2010) convey the personality of game characters by the nature and strengths of emotions a character portrays, and their tendency to act in a certain manner. However, this is still difficult to do well, and again it is hard to do it in a way that only a personality trait is expressed and nothing more.

Thirdly, a person can be described explicitly by mentioning the personality trait (e.g. “John is very conscientious”) or how the person behaves or would behave in certain circumstances (e.g. “John tends to get his work done very rapidly”). For example, Luchins (1958) produced short stories to portray extraversion and introversion. These contained sentences such as “he stopped to chat with a school friend who was just coming out of the store” and “[he] waited quietly till the counterman caught his eye”. Using a single sentence with just the personality trait is easy to do, but it may not provide participants with a strong enough perception of the trait and it can easily be overlooked. Using a story solves this, but the story may not convey the intended trait.

In all of these cases, it is important that the portrayal of a personality trait is validated as accurately creating the impression of personality intended, and not producing additional impressions (of an unintended personality trait or attribute such as intelligence, etc). For example, Luchins (1958) actually found that participants associated many other characteristics (such as friendliness) based on his stories. Kulik (1983) found that prior conceptions about the actors influenced people’s opinions.

2.4 Adapting to personality

There is growing interest in personalization to personality, as seen from the UMUAI 2016 special issue on “Personality in Personalized Systems” (Tkalčič et al. 2016) and the “Emotions and Personality in Personalized Systems” (EMPIRE) workshops. Research on personalization to personality has focused mainly in three domains: Persuasive Technology, Intelligent Tutoring Systems, and Recommender Systems. Table 2 presents a non-exhaustive list of such research.

As shown in Table 2, research on personality in Persuasive Systems has mainly focused on adapting messages (motivational messages, prompts, adverts, reminders) and selecting persuasive strategies. Adaptation tends to use the Five Factor Model, though there has also been work on adapting to susceptibility to persuasion principles and gamer types.Footnote 4 All papers cited use self-reporting questionnaires.

Research on personality in Intelligent Tutoring Systems has mainly focused on adapting feedback/emotional support, navigation (exercise and material selection) and hints/prompts. The Five Factor Model tends to be the basis for personality adaptation, though generalized self-efficacy (GSE) is also used. To assess personality, all papers cited used self-reporting questionnaires, except for Dennis et al. (2016), Okpo et al. (2016b) and Alhathli et al. (2016) who used indirect experiments in which participants made choices for a fictitious learner with a given personality.

Table 3 Self-report questionnaire for Generalized Self Efficacy (Schwarzer and Jerusalem 1995)

Research on personality in Recommender Systems (see also Tkalčič and Chen 2015) has broadly considered the following topics: improving recommendation accuracy (Wu and Chen 2015), boot-strapping preferences for new users (Hu and Pu 2011; Tkalčič et al. 2011; Fernández-Tobías et al. 2016), the impact of personality on users’ preferences on recommendation diversity (Tintarev et al. 2013; Chen et al. 2016; Nguyen et al. 2017), cross-domain recommendation (Cantador et al. 2013), and group recommender systems (Kompan and Bieliková 2014; Quijano-Sanchez et al. 2010; Rawlings and Ciancarelli 1997). Adaptation in recommender systems aimed at individuals tends to use the FFM. However, for group recommender systems other personality traits have been used (see also Masthoff 2015) such as cooperativeness. To assess personality all papers cited used self-reporting questionnaires, except Appel et al. (2016) who extracted personality from social media usage.

3 Creation of stories to express personality traits

This section describes the creation process of personality stories to express GSE, Resilience and the Five-Factor Model traits.Footnote 5 These stories will be validated and amended in the next section. Male names were used for all stories to keep gender constant. If “gender neutral” names had been used, then participants’ interpretation of the learner’s sex may have caused an unwanted interaction effect on the validation.

3.1 Stories for generalized self-efficacy

The self-report questionnaire for Generalized Self Efficacy Schwarzer and Jerusalem (1995) was used as a starting point, shown in Table 3.Footnote 6 Each questionnaire item is a positively weighted value. The overall score for GSE is the sum of each scale item, with a high score (max 40) indicating high GSE.

For the high GSE story, a selection of the questionnaire items were used and changed into the third person. For the low GSE story, the valence of the items was inverted. The stories were made more realistic by associating them with a character, a first year learner called “James” (the most popular male name in English in 2010, and therefore suitably generic). The resulting stories are shown in Table 4.

Table 4 Stories used for Generalized Self-Efficacy, high and low

3.2 Stories for resilience

For Resilience, questions were used from the Connor-Davidson Resilience scale (Connor and Davidson 2003). These encapsulate 5 factors that contribute to resilience—Positive attitudes to change and strong relationships; Personal competency and tenacity; Spiritual beliefs and superstitions; Instincts and tolerance of negative emotions; and Control. Using questions from each factor, a story was composed for both high and low resilience (see Table 5) that are roughly symmetrical in order and content. The clauses ‘David is kind and generous’ (for both high and low stories) and ‘He is friendly’(in the low story) were added to counter the fact that the low resilience story depicted a fairly negative character.

Table 5 High and low resilience personality stories
Table 6 Story construction for low emotional stability using the NEO-IPIP low items

3.3 Stories for the five factor model

Unlike GSE and Resilience, the Five Factor Personality Trait Model does not describe a single trait. As discussed in Sect. 2.1.1, the five factors (traits) are Extraversion, Agreeableness, Conscientiousness, Emotional Stability and Openness to Experience. Thus, the personality of any individual can be described by five scores, one for each of the factors. This means that stories had to be created for each trait, at both low and high level (totalling 10 stories).

To make the FFM Stories, we used the NEO-IPIP 20-item scales (Gow et al. 2005): combining the phrases into sentences to form a short story, with the addition of a name picked from the most common male names. Unlike the GSE scale, these scales provided both positive and negative items, so the high and low story could be made from the positive and negative items respectively. Table 6 exemplifies how the stories were constructed. Table 7 shows the stories.

Table 7 Preliminary Stories expressing each FFM trait at high and low levels

4 Validation of stories to express personality traits

This section describes the validation process of each story: how each story was checked that it correctly depicted the trait that it was intended to depict (the target trait).

A series of validation studies were performed for the stories constructed to convey Generalised Self-Efficacy, Resilience, and the traits from the FFM (Extraversion, Agreeableness, Conscientiousness, Emotional Stability and Openness to Experience). Each trait had two stories associated with it—one to express the trait at a high level, and one to express the trait at a low level.

For each trait, at least one validation experiment was conducted (the traits from the Five Factor Model required more, this is explained further in Sect. 4.3). Each validation experiment utilized a between-subjects design: participants were shown either the high story or the low story, and then asked to rate the personality of the person depicted in the story using a validated questionnaire for the trait in question.

As outlined in Sect. 3, the stories were originally constructed using an existing personality measurement questionnaire. For validation purposes, a different measurement questionnaire was used for the same trait, as this used different language and terms to the story (preventing participants from just recognising phrases), and made the purpose of the experiment less obvious and decrease demand characteristics.

For the GSE and FFM stories, we also measured how the stories conveyed other traits (non-target traits), to check how they were conveyed. For GSE, we investigated how the stories conveyed the FFM traits and Locus of Control.Footnote 7 It has been shown previously (Judge et al. 2002; Hartman and Betz 2007) that GSE interacts with both of these measures, however, if we found an unexpected interaction this would allow us to correct the story. For the FFM stories we checked how the other four non-target FFM traits were conveyed.Footnote 8 For Resilience, which again used crowd sourcing, a different approach was taken, which is elaborated on in Sect. 4.2.

4.1 Generalized self-efficacy (GSE) validation

This experiment explored whether stories did correctly convey different levels of GSE, and what other personality traits were implied, using a different validated trait assessment questionnaire for GSE (Chen et al. 2001). We also explored how the story depicted other traits in the FFM (using minimarkers Saucier 1994a) and a questionnaire for Locus of control (Goolkasian 2009). Fifty participants (42% female, 52% male, 6% preferred not to say; 34% aged 18–25, 48% aged 26–40, 14% aged 41–65, 2% aged over 65, 2% preferred not to say) recruited through convenience sampling in a between-subject design, answered these questionnaires, after reading the GSE personality story. 26 viewed the low GSE story and 24 viewed the high GSE story.

Table 8 Results of t tests for GSE story validation

Table 8 shows the results. t testsFootnote 9 were run for each of the traits to test whether the high and low GSE stories were significantly different from each other. This was significant at \(t(48)=-\,13.514\), \(p<0.001\). A Point-Biserial Correlation showed a significant difference (\(r(50)=0.89\), \(p<0.001\), \(R^2=0.79\)), showing a strong effect size for the GSE Stories.

The stories did however express some other personality traits and models at significantly different levels (Conscientiousness and Locus of control). However, this was to be expected as GSE is not an isolated construct: previous research has discussed possible correlations between GSE and other psychological constructs, including conscientiousness and locus of control (Judge et al. 2002; Hartman and Betz 2007). We therefore judged that these stories were sufficient for further experiments.

4.2 Resilience validation

Similarly to GSE, resilience is expected to correlate with other personality traits. We validated that the high and low stories depicted high and low resilience; no other traits were compared as it was anticipated that there would be an interaction (e.g. with low emotional stability) and this is not a problem for this measure. 44 participants were recruited through MT (26 female, 17 male, 1 undisclosed, aged 18–65). They were shown either the high or low story (between-subjects design) and asked them to assess the person in the story on the six item ‘Brief Resilience Scale’ (Smith et al. 2008). We added six items from another scale to mitigate hypothesis guessing and reduce response bias.

To validate the stories, we performed a between-subjects t test to test Average Resilience rating between the low and high stories. This was significant at \(t(41)=0.29\), \(p<0.001\). The mean resilience rating was 1.75 ± 0.51 SD for the low story and 4.20 ± 0.49 SD for the high story on a 1–5 scale. A Point-Biserial Correlation showed a significant difference (\(r(43)=0.93\), \(p<0.001\), \(R^2=0.85\)), showing a strong effect size for the Resilience Stories.

4.3 Five factor trait validation

This section is an improved version of previous research reported in Dennis et al. (2012b), with clarifications and an additional effect size analysis.

Fig. 2
figure 2

The pilot story validation questionnaire, for Emotional Stability

4.3.1 First iteration FFM: pilot study

The Emotional Stability stories from the FFM were used for a validation pilot study for the FFM traits, and to determine whether non-target trait mitigation would be required.

The same methodology from Sect. 4.1 was used. Eight participants (4 female; 5 aged 18–25, 3 aged 26–40) recruited through convenience sampling (4 students and 4 staff at the University of Aberdeen) were presented with one of the stories using a between-subjects design and asked to judge them on personality. However, as this was a pilot study, instead of using the 40 item minimarkers to judge the FFM, we used a TIPI questionnaire (Gosling et al. 2003) with 10 items instead (for brevity), shown in Fig. 2. The results are shown in Table 9.

Table 9 Results of pilot study for ES stories (high and low), as rated using TIPI for the FFM traits

The stories did convey Emotional Stability at polarized levels (i.e. the ratings for each story were at opposite ends of the scale for ES). However, there appeared to be a positive correlation with Agreeableness—more emotionally stable people were judged to be more agreeable (nicer) than neurotic ones. This effect could be spurious due to the low number of participants, or due to our decision to use the ten-item TIPI test rather than a more comprehensive test with a higher number of items. For more formal validation, a large number of unique participants is required for reliable data, particularly if adjustments to the stories are required. The second iteration uses a larger set of participants recruited through crowd-sourcing to establish whether the correlation with Agreeableness persists and also attempts to validate the stories for the other FFM traits.

4.3.2 Second iteration: validation of stories for the five factor model

100 participants (10 per story; 67% female) were recruited using MT. In a between-subjects design, each participant was presented with one story about a learner (see Table 7) which attempted to convey a target trait at either a high or low-level. Participants assessed this student’s personality using the Mini-Markers scale (Saucier 1994a).

Table 10 Normative ranges for each of the five traits, arising from the ratings of a liked peer for the minimarkers scale (Saucier 1994b), plus or minus one standard deviation

The rating for the target trait (i.e. the trait that the story was created to express) should be as polarized as possible—the “low” variant of a story aimed for a score as close to 1 as possible, and the “high” story aimed for a score as close to 9 as possible.

The decision for an acceptable value for a non-target trait is rather arbitrary. However, it is possible to derive normative values for each trait from large population samples. As these samples are similar to our own (e.g. English-speaking, USA-based), we decided it was acceptable to use these to characterise people as being either ‘high’, ‘low’ or ‘neutral’ in a trait.

To decide on acceptable values for non-target traits, a “normative range” was made for each of the five traits based on the average ratings of a liked peer for the minimarkers scales from 329 students from Illinois (Saucier 1994b),Footnote 10 plus or minus one standard deviation, shown in Table 10.

Results Table 11 shows the results of the original stories. There was a significant difference between all 5 pairs of stories in the perceived trait values for the target trait between the high story and the low story. For all but one personality trait (Openness), the perceived target trait values were clearly outside the normative range and in the correct direction. The perceived target trait value for low openness is below the normative range, but high story marginally outside the normative range. Problematically, there were many significant differences between the perceived non target trait values. Several perceived non-target trait values were also outside the normative range.

Table 11 Results for FFM stories

4.3.3 Mitigation

The following problems occurred between the pairs of stories during validation:

  1. P1:

    Perceived trait values on a non-target trait differ significantly

  2. P2:

    Perceived trait values on a non-target trait are outside the normative range

  3. P3:

    Perceived target trait values are very close to normative range

Problems P1 and P2 often appeared together—one (or both) of the perceived values for a non-target trait were outside the normative range and thus significantly different from the other. For example, in the story for low extraversion, the student was perceived to be less agreeable, despite correctly conveying low extraversion and the scores for the remaining non target traits being within the normative range. We hypothesised that the following story modifications could be taken in an attempt to mitigate problems P1 and P2:

  1. S1:

    Add a statement which implies a semi neutral stance on the problem trait, e.g. “Jack is quite a nice person” to mitigate low agreeableness.

  2. S2:

    Remove a statement which may be causing the interaction—e.g. removing “Jack has little to say to others” may increase agreeableness.

  3. S3:

    Add a statement targeting the problematic non-target trait from its own story—e.g. adding “Jack has a good word for everyone” from the high agreeableness story to increase agreeableness in other stories.

S1 was used because S2 (removing statements from the stories) was undesirable: this may affect the story’s expression of the target trait. We did not attempt S3 as it may over-alter the non-target trait score, and introducing another trait into a story may bring that trait’s undesirable interactions into the story. For example, the low conscientiousness story also conveys low agreeableness (see Table 16). If we added a statement from the high agreeableness story, this could in turn raise the ES score, as the high agreeableness story also conveyed high ES (further confounding the problem).

Table 12 Mitigating Statements for each non-target FFM trait
Table 13 Two stories for high Openness to Experience

4.3.4 Third iteration: validation with mitigated sentences

As the undesired non-target trait scores occurred most frequently in the low stories, these were targeted first. We constructed slightly positive statements (see Table 12) and added them where necessary. For the ‘high’ stories, only two non-target traits required modification: Extraversion in the Openness High story, and Emotional Stability in the Extraversion High and Agreeableness High stories. For the Extraversion High story, the score for Emotional Stability was 6.10, and the normative range ends at 6.08. Because this margin was so small, and there was no significant difference between the high and low variants’ ES scores, modification was not attempted to avoid more adverse effects. In the case of the high Agreeableness story, the value for ES was 7.28. S1 was employed by adding a mildly negative statement: “He is occasionally a bit anxious”. The Openness High story did not convey its target trait convincingly, and thus already required modification. Approach S2 was used in this case, removing statements such as “[he can] express himself beautifully” (see Table 13).

Design The design was the same as Sect. 4.3.2. Seventy participants (10 per adjusted story) were recruited from MT. Each participant saw one story in a between-subjects design.

Results Tables 14 and 15 shows the results for the modified stories. S1 was successful in most cases in mitigating P1 and P2. Exceptions to this were in the Agreeableness stories, the undesired non-target trait scores still remain, with the Low story expressing low ES and the High story expressing high ES (P1 and P2). For Conscientiousness, P1 occurred for Openness, despite both values being in the normative range. For low Emotional Stability, S1 was not effective for bringing the perceived trait value into normative range for Extraversion, with P1 and P2 still extant. S2 was successful in solving P2 for Openness High; bringing the Agreeableness value into the normative range. However, we were not successful in solving P3 for Openness high; the score for the target trait is further within the normative range.

Effect Size for Modified Stories To explore how strongly the high and low stories differed for each trait, a Point-Biserial correlation was computed between the high and low stories for each trait. There was a strong positive correlation between the story trait level (low or high) and trait score for each trait, showing that the stories depict the traits strongly at the intended levels (see Table 14).

Table 14 Point-Biserial correlations between the high and low story for each trait
Table 15 Results for corrected FFM stories
Table 16 Validated stories for each FFM trait, high and low

4.3.5 Discussion

The adjusted FFM stories are shown in Table 16. A story expressing a single polarized trait was always going to be difficult to achieve as the traits within the FFM are intercorrelated (Chamorro-Premuzic 2011). The interaction between Agreeableness and Emotional Stability was too strong to remove entirely. Adding a stronger statement to bring Emotional Stability into the normal range may cause more interactions with the other three non-target traits. In the Conscientiousness and Extraversion stories—the score for certain non target traits (O and A, respectively) still significantly differed. However, as these were all in the normal range, we do not see this as a problem. Problem P3 was not solved in the case of High Openness. Openness is a difficult trait to conceptualise—incorporating culture and art as well as political beliefs (Chamorro-Premuzic 2011). The perceived score was high, so it is likely therefore that it was expressing Openness highly, just not outside the range we devised.

4.4 Conclusion and limitations

A set of stories for the FFM, GSE and Resilience have been constructed and validated. Not all FFM stories are perfect, modifying them seemed to “dilute” the effect of the target trait, implying a balancing act. Further strategies could be used to remove the remaining interactions, however it may be that one trait inevitably infers another. We judge that the stories are good enough at expressing the traits for the purpose of investigating adaptation to personality in intelligent systems.

5 Using stories to determine personality

In this section we investigate how to use the stories to measure personality. Participants were given a standardised personality test and asked to rate how close they were to a pair of diametrically opposed personality stories using a sliding scale. A correlational analysis was performed on each trait to show that the sliding scale measured the trait with a strong correlation coefficient. We then conducted a reliability check, where a new sample of participants completed the sliders twice, 1 week apart. The scores between week 0 and week 1 were strongly correlated—thus the sliders could be used to measure personality (though this should not replace a standardised test when high granularity is required).

5.1 Methods

5.1.1 Materials

The validated stories were taken from Tables 4, 5 and 16. Different common Western names were used for each story, gender-matched to the participant. These were formatted so that opposing stories of the same trait were placed at either end of a sliding scale (see Fig. 3). The scale was coloured using a gradient from blue to green (left to right), with markers every 12.5%. The participant could indicate their position on the scale using a drag-and-drop slider. The position of the positive and negative stories was randomised for each participant and for each trait. The slider position gave a value of between 18 and 162, emulating a conventional 1–9 scale with greater acuity.

Validated personality questionnaires were used. For the Five Factor Model, the minimarker test (Saucier 1994a) was used. For resilience, the Brief Resilience Scale was used (Smith et al. 2008). For self-efficacy, the general self-efficacy scale was used (Schwarzer and Jerusalem 1995).

5.1.2 Procedure

Participants completed a personality questionnaire and then were presented with the slider test for each trait of the personality questionnaire they had completed, one at a time (five pairs of sliders for the Big Five Minimarker questionnaire and one pair of sliders for each other questionnaire).Footnote 11 Participants were asked to move the slider towards the person they thought they were most like. The slider was initially set at the 50% marker on the scale and participants had to manipulate the slider before they were allowed to continue, even if they chose to select 50%. Participants were then thanked for their time and invited to view the results of the slider test in the form of a bar graph. Participants were recruited from MT and were paid $0.80 (demographics shown in Table 17).

Fig. 3
figure 3

Screenshot of the slider between opposing trait stories

Table 17 Participant demographics for FFM, Self Efficacy and Resilience for slider validation studies

5.1.3 Design

Participants completed both the personality questionnaire and the slider test in a within-subjects design. Their score on the personality questionnaire was the independent variable and the Value of the slider position (which represents how close to the 2 trait stories the participant thought they were) was the dependent variable.

Our hypothesis (H1) was: For each trait, there will be a positive correlation between personality score and slider value.

5.2 Results

5.2.1 Five factor model

For each trait, a correlation analysis was run of Trait Score \(\times \) Slider Value. This was significant for each trait (see Table 18). Correlation graphs were plotted for each trait (Fig. 4) and a regression analysis run. The regression formula for each trait is shown in Table 18. Participants’ mean scores on the minimarkers scale (see Table 19) were compared with the minimarkers normal range (see Table 10) to see if the MT participants’ varied from a normal population. All traits were within the normal range, except emotional stability which was slightly higher. To investigate the effect of other traits on the correlation for each trait, a partial correlation analysis was run to control for the effect of non-target traits. This correlations remain strong (see Table 20).

Table 18 Pearson’s r for correlation of Trait Score \(\times \) Slider Value for each personality trait, effect size \(R^2\), regression formula and standardized error of the estimate SEE
Fig. 4
figure 4

Correlation of Trait Score \(\times \) Slider Values for the FFM personality traits

5.2.2 Resilience and generalised self efficacy

For each personality test, correlation graphs were plotted (Fig. 5) and a correlation analysis was run of Test Score \(\times \) Slider Value. This was significant for Resilience (\(r(60)=0.58\), \( p< 0.01\)) and GSE (\(r(62)=0.62\), \(p < 0.01\)). The regression formula for each trait is shown in Table 18.

5.3 Reliability check

To test the reliability of the sliders, a reliability check experiment was conducted using all 7 sliders (FFM, GSE and Resilience). Participants recruited through opportunistic sampling completed the sliders and the FFM TIPI test (Gosling et al. 2003) as the first part of a persuasion experiment (reported in Ciocarlan et al. 2019). After 1 week they completed the sliders and TIPI test again (as well as the second part of the persuasion experiment).

Fifty-one participants completed the study (27 female, 23 male, 1 undisclosed; 21 aged 18–25, 23 aged 26–40, 7 aged 40–65). A correlation analysis was run between Slider Values for Week 0 \(\times \) Week 1 for all traits. The results are shown in Table 21. There was a strong correlation for each of the sliders between Week 0 and Week 1 (\(r=0.70\)–0.86, mean \(=0.81\)). There were several other significant weaker correlations—expected correlations between FFM traits and GSE and Resilience (as these traits are known to correlate with FFM traits; see Section 4), and some correlation within FFM traits.

To explore the inter-trait correlations within the FFM traits, a correlational analysis was run for the TIPI test for each FFM trait between Week 0 and Week 1. The results are shown in Table 22. We found a similar pattern of correlation between non-target traits as we found in the sliders, with the TIPI test showing more correlations between non-target traits than the slider test. We can therefore see that the inter-trait correlations are captured by a validated personality test within our sample, and that the sliders show good test-retest reliability for target traits at Week 1.

Additionally, we used the data from Week 0 to repeat our validation experiment for the FFM sliders. A correlational analysis of FFM slider values \(\times \) TIPI test scores showed a significant correlation between each trait’s score on the slider test and TIPI test (E: \(r=0.78\), A: \(r=0.62\), C: \(r=0.62\), ES: \(r=0.83\), O: \(r=0.33\); \(p<0.01\) for E, A, C and ES, \(p<0.05\) for O). These are similar to correlations reported in Table 18; O has a weaker correlation and ES has a stronger correlation in this reliability check.

Table 19 Means of study participants for the minimarkers scale
Table 20 Partial correlations of each FFM trait on Minimarkers compared with the slider score, controlling for each other trait score on the non-target sliders
Fig. 5
figure 5

Correlation of Trait Score \(\times \) Slider Value for GSE and Resilience

Table 21 Pearson’s r Correlation of the slider value of each pair of stories: FFM (E, A, C, ES, O), GSE and Resilience, repeated after 1 week
Table 22 Pearson’s r Correlation of the FFM TIPI test score (E, A, C, ES, O) at Week 0 and Week 1

5.4 Interpreting slider values

There are several possible strategies in the interpretation of the slider values for use in personality experiments. The slider values form a continuous variable, which can be used in analysis for further studies (e.g. using a regression analysis). Splitting data into distinct groups is often considered undesirable, as it causes the data to lose power (Irwin and McClelland 2003). However, for some studies it may be useful to use the slider values to divide participants into High and Low groups (for example, when you want to offer different content to people with different traits).

When choosing to divide participants into groups, it is important to consider statistical features of the data (e.g is the data statistically normal), as well as the purpose of the study, and the limitations of data collection. For non-normal data, data can be split using the median, tertiles or quartiles. For normal data, groups can be formed using the mean or standard deviation. A further option is to take the highest and lowest scoring participants to form a defined group size (e.g. top 50 and bottom 50), or to use a hybrid method (e.g. the top and bottom 20 participants at least 1 standard deviation from the mean). It is also possible to compute the equivalent score on a standardised test (e.g. the TIPI test), by using the regression formula generated at validation (e.g. in Table 18) and group by population normative data for that test, when available (e.g. Table 10). The choice should be guided by how much data can be discarded, the importance of groups being distinct from each other, and how many groups are required (i.e. a ‘neutral’ group required). This is summarised in Table 23.

5.5 Discussion

This section has demonstrated how to use trait stories to measure personality. For each trait, there is a strong correlation between participants’ scores on standardised personality tests and their scores on the slider scale (see Table 18). The effect size of the correlations imply that more polar trait stories (i.e. pairs of stories that are rated as very high and low in the trait) result in a sliding scale that better reflects the personality test. This can be seen in the comparatively low correlation for the Openness to Experience slider in Table 20. This highlights the importance of the story validation stage of development.

It should be noted that, while the sliders may be preferable to questionnaires, they have a lower accuracy than many standardised questionnaires. As for any decision about which measure to use in a study, the benefits of using the slider measure should be weighed against its lower accuracy; e.g. where high attrition needs to be mitigated by simplifying the questionnaires, or where the intended analysis groups users by trait.

Table 23 Summary of ways to divide Personality Slider data into groups

6 Applying stories and sliders in personality research and beyond

This section provides examples of how the personality stories and sliders, and the method used to produce them, have been used in adaptation research, for adaptation to personality and beyond, demonstrating evidence of the method’s usefulness.

6.1 Portraying personality

Personality stories provide an easy way of portraying certain personalities as needed for indirect and user-as-wizard studies. Based on our research (i.e. Sect. 4), using personality stories also ensures (as far as possible) that the impression of the participant of the person’s personality is in accordance to what the story is intended to express. Personality stories have been used for investigations into adaptation in persuasive technology, intelligent tutoring systems, and recommender systems (see Table 24). In Dennis et al. (2015) an indirect study was run with 68 participants investigating the impact of a skin cancer patient’s personality on the perceived suitability of reminder messages (varied types based on Cialdini principles Cialdini 2001) to self-check their skin. Participants were provided with a personality story about a fictional skin cancer patient. They rated the suitability of reminder messages for this patient and selected the best message to use. Results showed a significant difference between participants based on levels of Conscientiousness: those high in Conscientiousness preferred authority messages as the second reminder whilst those low in Conscientiousness preferred scarcity messages.

Table 24 Studies using personality stories and sliders to obtain or portray personality

In Dennis et al. (2016), five user-as-wizard studies were run with 1203 participants in total, each investigating the impact of one of the FFM personality traits (as well as performance) on feedback (emotional support and slant) given to a learner. Participants were provided with a personality story about a learner and their performance, and provided feedback. Based on this data, an algorithm was developed that adapted feedback to Conscientiousness and Emotional Stability.

In Dennis et al. (2011), a User-as-Wizard study was run with 19 teachers, investigating the impact of GSE on feedback (slant). Participants were provided with a GSE personality story about a learner and their performance, and produced feedback. There was some evidence of teachers putting a positive spin on feedback for learners with a low GSE.

In Okpo et al. (2017), a User-as-Wizard study was run with 201 participants, investigating the impact of the Self-Esteem personality trait (as well as effort and performance) on exercise selection (difficulty level). Personality stories were constructed for Self-Esteem using the methodology presented in this paper. Participants were provided with either a low or high self-esteem story, the effort put in by the learner and their performance on a previous exercise. Participants selected the difficulty level of the next exercise for the learner to do. Self-esteem had an impact on difficulty level selection.

In Tintarev et al. (2013), a User-as-Wizard study was run with 120 participants, investigating the impact of Openness to Experience on recommendation diversity. Participants were provided with a personality story about a fictional friend as well as some indication of that friend’s book preferences, and provided three book recommendations to this friend. There was some evidence that participants took Openness to Experience into account when producing the recommendations.

In Smith et al. (2015) and Smith (2016), two User-as Wizard studies were run with 61 and 45 participants respectively, investigating whether emotional support messages should be adapted to the recipient’s Emotional Stability and Resilience respectively. Participants were provided with a personality story about a carer experiencing a stressful situation, and provided emotional support messages for this carer. Results showed that neurotic carers were provided with a wider range of emotional support. No effect was found of resilience on message selection.

6.2 Obtaining personality

Some studies require participants’ personalities in order to analyse the impact of that personality on dependent variables (e.g. participants’ preferences, participants’ learning, etc). Most of the studies presented in Table 2 are of this type. The personality sliders have been used to obtain participants’ personality to investigate adaptation in persuasive systems and intelligent tutoring systems. See Table 24 for example studies.

In Smith and Masthoff (2018), a study was run with 138 participants investigating the impact of personality on their appreciation of emotional support messages for stressful situations. Participants were told about a carer experiencing a stressful situation and rated an emotional support message provided by the carer’s friend on how helpful, effective and sensitive they felt it was. Participants’ FFM personality traits were obtained using personality sliders. Results showed that personality only had a small impact, with agreeableness and emotional stability warranting further investigation.

In Smith et al. (2016), an indirect study was run with 51 participants investigating the impact of personality on perceived persuasiveness of reminder messages (differing in type based on Cialdini principles Cialdini 2001) to self-check their skin for skin cancer patients. Participants’ FFM traits were obtained using the personality sliders. They were told about a skin cancer patient who had the same personality as themselves and rated the suitability of reminder messages for this person. Results showed that personality is important when deciding on the type of persuasion to use in reminder messages.

In Thomas et al. (2017) and Josekutty Thomas et al. (2017), an indirect study was run with 152 participants investigating the impact of personality on the perceived persuasiveness of healthy eating messages differing in type and framing (positive or negative). Using the FFM personality sliders, the participants’ personalities were obtained. They rated the perceived persuasiveness of messages for someone with a similar personality as themselves. There was some evidence of conscientiousness impacting persuasiveness.

In Alhathli et al. (2016), an indirect study was run with 50 participants exploring the impact of a learner’s extraversion on the selection of learning materials (active vs passive, and social vs individual). Participants’ personalities were obtained using the FFM personality sliders and they were told the learner had the same personality as them. They rated learning materials on the extent they felt the learner would enjoy them and they would increase the learner’s skills and confidence. Extraversion was found to impact perceived enjoyment of social learning materials. In Alhathli et al. (2017), a similar study was run with 163 participants where the learning materials reflected learning styles, and participants’ learning styles were measured in addition to their personality. No impact of either personality or learning style was found.

Results from these studies showed that the slider results can be used both for correlation analyses and to divide participants into high/low groups on different traits.

6.3 Applying the method beyond personality research

Finally, the method described in this paper for developing validated stories can also be applied to non-personality user or context characteristics. We have successfully applied this in multiple studies—for example, Smith et al. (2014) and Kindness (2014) developed stories that depicted different types of stressors experienced respectively by carers and community first responders. Forbes et al. (2014) developed stories that depicted different attitudes towards usage of transport means. In all of these cases, the stories were used to bootstrap adaptation research.

7 Conclusion

Increasingly, as illustrated in Sect. 2.4, research on adaptive systems is investigating personality as a user characteristic for adaptation. However, to do this effectively, reliable and lightweight ways are needed to express personality (for use in indirect and user-as-wizard studies) and to obtain user-personality. The paper makes two major contributions to this.

Firstly, the paper contributes a methodology for creating and validating stories that reliably express a personality trait. To illustrate the methodology, the paper presented the creation and validation of stories expressing the Five Factor model traits (extraversion, agreeableness, conscientiousness, emotional stability, openness to experience), generalized self-efficacy, and resilience. The usefulness of the personality stories for adaptation research has been shown by the many examples provided of their use for indirect and user-as-wizard studies (see Sect. 6).

Secondly, the paper contributes a lightweight methodology for obtaining user-personality, using the personality stories as part of a self-assessment scale. These personality story scales can be used in studies investigating the impact of a trait, and may also be used by a system to allow it to adapt to this trait. The paper contributes guidelines on how to use such scales. The usefulness of the personality story scales for obtaining study participants’ personality has been shown by their usage in adaptation studies (see Sect. 6).

While this paper looks at a small number of personality traits, the methodology can be extended to any user factor for which a validated questionnaire exists. So, as indicated in Sect. 6, this methodology has not only been been successfully used to produce additional stories for the personality trait self-esteem, but also to express user attitudes and stressors experienced. The more general methodology is the same as we used for personality (see Fig. 1), now using stories to express any characteristic.

There are several limitations and opportunities for future work. Firstly, the personality stories developed in this paper only portray a single trait. Although this enables investigations of the impact of such a trait, e.g on feedback to a learner, this does not facilitate investigations into interaction effects of multiple traits. To investigate this, stories which express two or more traits at the same time need to be developed.

Secondly, the stories developed in this paper only portrayed personality traits. We discussed above how the same method for constructing and validating stories has been used by us to portray other user and context characteristics such as stressors and user attitudes. We would like to extend this work by developing validated stories for portraying affective state, based on existing self-reporting affect scales. Similarly, we are interested in developing stories that reliably express other aspects such as learner performance and learner effort (a starting point towards the latter has been made in Okpo et al. (2017). When constructing such stories, care needs to be taken to avoid unintentionally evoking personality. For example, a learner who always performs well could be perceived as being highly conscientious, even when this was not the case. Another interesting area for validated story development may be to portray cultural differences (in line with Hofstede’s work on cultural dimensions Hofstede 1983).

In summary, whilst there has been substantial research effort on obtaining user-personality, there has been only very limited work on reliably expressing user personality. This paper has provided a methodology for doing so through validated personality stories, and has also shown that these stories can be used as an additional light-weight method for obtaining user personality.