Background

Research into non-verbal communication and particularly facial expression has often required the use of specific experimental material related to the body or the face. The creation of this material can be produced by different methods depending on the experimental goals. This can be a databank of pictures or videos selected for emotional induction (Bänziger et al., 2009). For facial expressions, it can be produced by actors directed by the researcher (Aneja et al., 2017; Happy et al., 2017; Langner et al., 2010; Lucey et al., 2010; Mavadati et al., 2013; Mollahosseini et al., 2017; Sneddon et al., 2012; Valstar & Pantic, 2010) or by recording the spontaneous reactions of subjects exposed to selected stimuli to induce targeted facial expressions (Tcherkassof et al., 2013). This material is mainly composed of pictures and videos showing postures of the body and face.

The production of a software program to model facial expressions and fine movements of the face would allow researchers to create experimental material based on the characteristics of the Facial Action Coding System (FACS) (Ekman et al., 2002). These pictures seem to reflect a possible social interaction or basic emotion depending on the researcher’s theoretical position (Crivelli et al., 2016; Crivelli & Fridlund, 2018).

More recently, several software programs have been produced and released to researchers (Amini et al., 2015; Villagrasa & Susín Sánchez, 2009). These validated programs are however not completely free of rights or usable on any operating system. They are dependent on commercial 3D engines, respectively Haptek 3D-characters for the HapFACS software program (Amini et al., 2015) and FaceGen Modeller for the FACSGen (Krumhuber et al., 2012). Nevertheless, HapFACS and FACSGen software programs are free of charge and available upon request from their respective research laboratories.

The present paper

This paper presents a software program that allows researchers to produce the experimental material they need to create facial expression materials (pictures and videos) based on the FACS developed by Ekman et al. (2002). FACSHuman was developed under an open-source license and is based on the characteristics of the two previously cited software programs. It extends the possibilities by allowing users to modify and redistribute it. It is based exclusively on open-source software programs (MakeHuman, Blender, Gimp) and the Python programming language usually used in academic research. Four studies were conducted to evaluate the relevance and the accuracy of this software.

FACSHuman software architecture

FACSHuman is based on the Makehuman software program and the FACS. It consists of three additional plugins for Makehuman. The main goal of FACSHuman is to produce experimental material such as images and animations, material that can meet fairly high criteria of realism, esthetics, and morphological precision. With this software program, the muscles of the face, eyes, and tongue can now be manipulated, and facial features, the color of the skin and eyes, and the skeletal structure can be customized via Makehuman.

MakeHuman software program

Makehuman was selected as a framework for the development of the software program presented in this article. It is a free software program and can be extended by external plugins. Its 3D rendering engine is based on OpenGL technology. It is used primarily for the creation of three-dimensional avatars in the video game industry and 3D recreation. This software program allows the entire human body to be modeled, including the morphology of the face and its main features as well as gender, race, and age from babies to older adults. The color of the skin and the eyes can be changed, modified or imported. The possibilities are unlimited in terms of combinations of age, gender, race. These parameters can be mixed together in the MakeHuman program, allowing the users to create as many character identities as they need. These characters can be saved and shared and reused for other experiments or exported into a third 3D software program such as Blender or Unity 3D. Previously cited software programs are limited to the face or the head and the bust. MakeHuman, which we used as a framework for our plugins, allows the creation of body shape and postures.

The Facial Action Coding System

The FACS was developed by Paul Ekman (Ekman et al., 2002). This system is mainly used in research in nonverbal human activities and behaviors but also in research into the creation of avatars and artificial agents. It allows the fine movements of the face to be coded and facilitates the dissemination of facial configurations observed or used in research. This system breaks the face down into Action Units (AUs), which correspond to the muscular movements of the face observable in humans (as shown in Fig. 1). During the observation, the coding of each AU is characterized by two pieces of information (number of Action Units and intensity). For instance, 1B + 2C + 5D is a code of an observed facial configuration. Table 1 contains some examples of AUs taken from the learning manual of this coding method (Ekman et al., 2002).

Fig. 1
figure 1

Sample of coded AU with FACS

Table 1 Sample of FACS AUs

This coding system requires long training estimated at more than 150 h (Ekman et al., 2002). A certification is available and validates the acquired skills. This system of codification of facial expressions was selected for the development of this software program.

Creation process of textures in 3D model used in the additional plugins

The 3D animation of both the human body and face can be created by various technical means such as the mixing of structures, animation by bones, manipulation of textures (Magnenat-Thalmann & Thalmann, 2004) and muscular geometric modeling (Kähler et al., 2001). The animation technique selected for the development of this software program was BlendShapes or morphing of target expressions (mixture of structures). All targets used in FACSHuman plugins were created with the Blender software and imported into Makehuman. These different targets are available and can be easily and simply manipulated within the FACSHuman plugin.

Software development presentation

A set of three additional plugins to the MakeHuman software has been developed for facial expression creation (plugin 1), facial animation (plugin 2), and for scene editing (plugin 3). These allow the user to produce images of a chosen quality (resolution), to make sets of still images of progressive intensity, to mix expressions between them and to generate videos.

There are several coding systems for facial movements, but the main ones are MPEG-4 facial animation (part of the Face and Body Animation proposed by Pandzic & Forchheimer, 2002) and the FACS. The work presented here is based on the FACS, which was created and revised by Ekman (Ekman et al., 2002). The FACS is used in research into nonverbal communication and emotional facial expressions as well as in artificial facial animation creation projects (Bennett & Šabanović, 2014; Dalibard et al., 2012; David et al., 2014).

FACSHuman facial expressions creation tool (plugin 1)

The first FACSHuman plugin developed, which is presented here, allows complex facial expressions to be created, and the elements of muscular movements of the face, skin, and eyes to be defined as well as those of the jaw and the head (see Fig. 2). On the left of the screen, users can move sliders to blend predefined emotions or create their own facial expressions by increasing AU intensity one by one. On the right side of the screen, users can adjust the camera angle, its zoom function, and its position, and load and save facial expressions defined by their FACS code. Users can also program the parameters of the numbers and characteristics of the batch picture processing. In this plugin, the user can define the number of images he/she wishes to create. This allows a gradual variation to be created in the intensity of the expression, created to be used, for example in experiments on recognition thresholds (see Fig. 3).

Fig. 2
figure 2

FACSHuman user interface

Fig. 3
figure 3

Intensity progression of facial movement

The different AUs implemented in the software are combinable and can be mobilized on a time frame for the creation of macro and micro expressions and complex animated expressions. The animations and images created have a transparent background. The researchers thus have the freedom to present the modeled faces on a colored background or an image of their choice.

An emotional mixer is available in addition to the different manipulatable Action Units. It allows the users to create blended emotions. It uses the nine emotions described in the EmFACS (Ekman et al., 2002).

This plug-in allows users to manipulate the following modeled AUs (Appendix A1, A2, and A3). The organization of the different AU categories on the interface is based on the one used in the Score Sheet of the FACS manual.

FACSAnimation Tool (plugin 2)

The second plugin FACSAnimation tool (FANT) is a plugin for creating animations of facial expressions. The user can compose, create, and record animations by direct creation or by mixing different expressions created in the facial expression creation tool. The batching of images is done either by modeling or by using an already recorded expressive configuration (saved in .facs files).

This image generation mode advances the intensity of each AU in a homogeneous and joint manner by following a linear progression and the maximum intensity characteristics of each AU defined by the user. The total duration of the video is controlled by the frame rate determined by the user and the number of images chosen during batch creation processing. As a result, intensity progression can be adjusted by these parameters. For the same framerate, the more images were used to create the animation, the more the video will be slowed down.

When this option is used, the progression follows the user-defined time characteristics in the FANT animation plugin. A parameterizable timeline in the FANT plugin is available and allows the creation of complex expressions such as moving from one expression to another with an apex area or using the data available for the transition between several predefined expressions. AU movements are divided into three parts. Initial intensity, apex, and final intensity. The apex areas hold the AUs at their respective maximum intensity chosen during the facial expression creation process, from the start to the stop position defined for them on the timeline. For more flexibility, apex start and stop intensity can be defined singly. In this way, users can create more complex movements (Fig. 4).

Fig. 4
figure 4

Characteristics of one Action Unit defined on the timeline

This also offers free creative possibilities directly in the plugin and allows the design of non-linear complex expressions that are closer to what is observable on a human face.

The timeline allows the mix of an unlimited number of expressions, each of the AUs with the characteristics (duration, start, end, intensity) of an AU described in the Investigator guide (Ekman et al., 2002). The FANT plugin allows unlimited use in numbers of an identical AU within an animation such as Expression 1 to Expression 2 to Expression 3 ... (Fig. 5). Users can choose as many AUs as needed and define the timeline characteristics for each of them inside the plugin user interface.

Fig. 5
figure 5

Sequence produced using the FANT module, example of progression from an expression of anger to an expression of surprise

The target expressions are characterized by values (value from 0 to 1 of the number of images defined during the creation of the set of images). When users save their work, for each of the AUs used in the expressions, a section is recorded in a JSON file as follows:

figure a

The generation of synthetic animated facial expressions must be designed in a realistic way so that the stimulus presented to the subject does not provoke an effect such as the Uncanny Valley (Burleigh et al., 2013; Ferrey et al., 2015). This effect occurs by presenting a stimulus that resembles a human to a subject. It often has the effect of causing an emotional reaction of greater or lesser intensity that would partly be the result of difficulty in categorizing the object presented (Yamada et al., 2013). To minimize this effect, and to produce non-linear animations (Cosker et al., 2010), the FANT plugin allows users to define the temporal presentation and intensity characteristics of each AU implemented in the expression targeted by the creation of the user. For each AU, the start and stop position as well as the evolution of individual intensity can be defined. This results in observable asynchronous movements (Fig. 6). It thus allows users to transpose observed facial expressions or those taken from databases such as DynEmo (Tcherkassof et al., 2013).

Fig. 6
figure 6

Breakdown of a timeline and composition of AUs intensity

FACSSceneEditor (plugin 3)

The third and last plugin FACSSceneEditor (FSCE) defines the lighting of the scene. As in a photographic studio, the user has the opportunity to place lighting around the face to create different types of staging and change their characteristics. For each added light, users can define their positions on x, y, and z axes, choose the color on a color picker, specify the light level and the specular reflection as well as other parameters available with OpenGl technology (see Fig. 7). This characteristic of the plugin results in constant lighting of the scene producing a stable environment for the generation of images, whatever the model selected.

Fig. 7
figure 7

Scene editor and lighting possibilities

Video production

The creation of still image batches allows the generation of videos whose number of frames per second can be defined, as well as a pause range. This pause (of a determined number of images) on an image will simulate the expressive apex when using the generation of a simple sequence (a single target expression). Neutral to neutral image generation can be performed from a configuration file for a particular facial expression, but also by using the possibilities offered by the creation of complex animations by timeline. Users also have the ability to navigate within an animation in the software, which allows them to select a specific moment, adjust its intensity and generate a batch of images from a chosen moment.

Software portability and modifications

Add-on plugins were programmed with the same Python computer language as Makehuman. This language is widely used for academic purposes and is an interpreted language. This feature gives it great flexibility of use and the ability for users to freely modify the source code of the plugins or add more functionality if needed. Users can also automate the creation of large volumes of images or videos without external intervention. All software and plugins presented here are released under an open-source license. This software can be freely distributed, modified, and distributed in accordance with the conditions related to this type of license. This software and its plugins can be used with the main operating systems.

Experimental evaluation presentation

Four studies were conducted to carry out an evaluation of images and videos produced by the FACSHuman software program. In Experiment 1 we asked non-FACS coders to categorize emotional facial expressions created with FACSHuman and encoded from the pictures found in the Pictures of Facial Affect (POFA) (Ekman, 1976). In the second experiment AUs alone were evaluated by non-FACS coders in comparison to the one described in the FACS manual. In Experiments 3 and 4, the accuracy of the different facial AUs and expressions described in the FACS (Ekman et al., 2002) were evaluated by two certified FACS coders.

Experiment 1

This evaluation study of the FACSHuman software program focused on the distinction of facial configurations produced with the software and considered as representative of an emotional experience (Cigna et al., 2015; Darwin et al., 1998; Dodich et al., 2014; Ekman, 1971; Ekman, 1976; Ekman, 1992; Ekman & Oster, 1979).

Method

Participants

Forty-three participants, 30 women (Mage = 42.1, SDage = 13.44) and 13 men (Mage = 43.54, SDage = 9.90), were engaged in this experiment via the LinkedIn professional network. They were recruited by the internal messaging system. One participant did not complete the age and gender part of the questionnaire. The participants validated, via a checkbox, a form giving free, informed and express consent before the experiment. The experiment could not begin without the validation of the form. It took place on an internet browser on the participant’s computer and respected their anonymity.

Materials

The experimental material consisted of 42 black-and-white images representing faces in front view for the experimental part. Six different avatars were used (three men, three women) with seven images (one neutral and six emotional facial expressions) for each avatar for a total of 42 images. Different avatars were used to avoid the reinforcement process due to multiple exposure of the same emotion with the same face. Seven pictures with the same characteristics were produced for the training part. The use of black and white was justified as an element of comparison with existing databases and in particular the POFA (Ekman, 1976) as well as images from the FACS manual (Ekman et al., 2002). Previous studies have shown that there is no significant difference in recognition or categorization tasks between color and black-and-white images (Amini et al., 2015; Krumhuber et al., 2012). These faces were produced with GIMP and Blender software programs for the texture and the FACSHuman plugin for the modeling of facial expressions.

The construction of the gender and the criteria used for the creation of the images were based on characteristics described in previous studies like facial features such as the eye region, eyebrow shape, skin texture and chin shape (Baudouin & Humphreys, 2006; Bruce et al., 1993; Bruyer et al., 1993). We used six facial expressions of basic emotions (anger, disgust, fear, happiness, sadness, surprise) and a neutral expression for each modeled face. This material was produced according to facial expression configurations from the best evaluated pictures found in the evaluation table provided with the POFA database. All these facial expressions were coded by a certified FACS coder. Table 2 presents the detailed FACS codes.

Table 2 FACS codes used to create facial expressions

The experiment was constructed using the JSPsych (de Leeuw, 2015), which allows the creation of experiments that run in an internet browser. This feature offers the possibility of organizing experimental tests on the participant’s computer with response times equivalent to those observed in the laboratory (de Leeuw, 2015; de Leeuw & Motz, 2016; Pinet et al., 2017; Reimers & Stewart, 2015). The experiments can also be used in the laboratory. Data were collected anonymously and stored in a MySQL database.

Procedure

Participants were asked to categorize FACSHuman generated pictures that were presented according to the six basic emotions (anger, disgust, fear, happiness, sadness, surprise) and neutral.

The experiment began with a training phase on the use of the interface. Buttons at the bottom of the screen were used to navigate from one screen to another. The participants interacted with the interface using the pointing device available on the computer. During the training and experimentation phase, participants were instructed to answer the question: 'What emotion is this person expressing?'. They responded by clicking several buttons labeled happiness, disgust, sadness, fear, anger, surprise, or neutral located at the bottom of the photos.

For the training part, the participants had to perform a categorization task on six images of emotional facial expressions and one neutral image. The emotional images were presented with an expressive intensity of 100%.

The experimental part had the same characteristics as the training part. Pictures of the training section were not reused. Each of the 42 images, described in the materials section, were presented at 100% level of intensity as configured in the software in accordance with the FACS definition (Ekman et al., 2002). Images were exposed in random order. The images of the modeled faces were displayed until the category was selected by the participant (happiness, disgust, sadness, fear, anger, surprise, and neutral). The choice of category led to the continuation of the experiment.

For each participant, a mean percent correct score was calculated. There was no feedback during the training and experimental part to avoid the learning reinforcement process. At the end of the experiment, demographic information was collected. This concerned gender, age, socio-professional category, and professional activity.

Results

The total categorization score for all participants was 85.5%. As some papers suggest gender differences in judgement of emotional facial expressions (Birditt & Fingerman, 2003; Hall & Matsumoto, 2004; Ryan & Gauthier, 2016), a Welch two-sample t test was applied to gender and the total score of each participant. There was no significant difference (p = .73, d = 0.11, power level = .99) between men (M = 84.85, SD = 8.47) and women (M = 85.83, SD = 8.85).

A three-way analysis of variance was conducted with gender as a between participant factor, and face genders of stimuli (man, woman) and expressions (happiness, disgust, sadness, fear, anger, surprise, neutral) as repeated factors (see Fig. 8). There was no effect involving participants’ gender. A Tukey HSD test was used to examine significant effects of the gender and emotional category of stimuli. There was a main effect of face gender, F(1, 6) = 10.76, MSq = 9.32, p < .001, χ2 = .02. Women’s faces (M = 88.53%, SE = 1.32) were better categorized than men’s faces (M = 82.47%, SE = 1.64). There was a main effect with emotional configurations, F(1, 6) = 9.49, MSq = 8.22, p < .001, χ2 = .08. Anger (M = 96.21%, SE = 1.17) and neutral (M = 93.94%, SE = 1.48) facial configurations were categorized more accurately than others. Fear was the least categorized expression (M = 73.86%, SE = 2.71; all ps < .01). Happiness (M = 79.92%, SE = 2.46), disgust (M = 86.74%, SE = 2.09), sadness (M = 85.61%, SE = 2.15) and surprise (M = 82.2%, SE = 2.34) were well categorized.

Fig. 8
figure 8

Models used for the categorization task (ex: anger, disgust, fear, happiness, sad, surprise)

The type of errors made by participants were examined (Table 3).

Table 3 Confusion matrix of EFE (in %) for FACSHuman pictures

Anger (96.21%) was the most recognized emotional configuration with less than 2% confused with fear and surprise. Happiness (79.92%) was the most confused with neutral (18.18%), disgust (86.74%) with anger (10.61%), and sadness (85.61%) with neutral (9.85%). Surprise (82.20%) and fear (73.86%) were confused the most (16.67%), and neutral (93.94%) was confused with sadness (3.41%).

To compare our results (Table 3) with the POFA database (Table 4), we used the confusion matrix table provided in the Cigna et al. (2015) paper on the POFA’s pictures of emotional facial expressions. These tables describe the categorization performance and the error rates of participants for each emotional category. The diagonal results correspond to the percentage of correct responses to the facial expressions presented. For each emotional facial expression, rows indicate the percentage of confusion with others' emotional facial expressions. As results for neutral expression were not present in this table, neutral FACSHuman expression scores were not used in statistical computation. A Welch unpaired two-sample t test was used to compare the overall recognition score between POFA and FACSHuman for the six basic facial expressions. The results indicated that there was no significant difference (p = .44, d = 0.47, power = .65) between POFA (M = 78.93, SE = 5.51) and FACSHuman (M = 84.08, SE = 3.07) total score.

Table 4 Confusion matrix of EFE (in %) for POFA pictures from (Cigna et al., 2015)

Our data were significantly different from normal distribution for each category. One-sample Wilcoxon rank-sum tests were used to compare the results obtained for the FACSHuman expressions with those from the POFA, in a one-by-one categorical comparison. The results showed that FACSHuman expression scores were significantly higher except for happiness and surprise expressions, where the POFA expressions were better rated (see Fig. 9 and Table 5). Effects size showed large differences.

Fig. 9
figure 9

Percentage of categorization by stimulus gender for FACSHuman Stimuli. Error bars represent standard errors

Table 5 Categorical scores comparison between POFA and FACSHuman

Comparison with previous works. As we have seen in the introduction, earlier software programs which are the most similar to FACSHuman are HapFACS and FACSgen. We computed means and standard deviations with the data found in the HapFACS studies for static emotional expressions at their respective maximum intensity. The overall recognition rate for the six basic emotional facial expressions created with FACSHuman was 84.09 (SD = 7.51). This result is situated between the results of the two previously cited software programs for the same emotions (Table 6).

Table 6 Comparison of the recognition rate by software

Discussion

The aim of this study was to evaluate images generated by FACSHuman to reproduce facial expressions of emotion as defined by theoretical models (Ekman, 1971, 1992; Ekman & Oster, 1979). The results obtained were comparable to those of the various picture databases such as POFA (Cigna et al., 2015; Ekman, 1976; Palermo & Coltheart, 2004). Analyses of categorization data collected showed that modeled face creation for experimental use was possible with the FACSHuman plugins. Overall, the categorization scores of FACSHuman emotional facial expressions rendered with women’s morphological criteria were better than those for men’s faces. Anger and neutral facial configurations were the most accurately categorized (Ekman et al., 2002). Fear was the least categorized expression and confused with surprise (Becker, 2017). This result is probably due to the AUs shared between these two expressions (Du & Martinez, 2015).

There were no significant differences between the results of the POFA evaluation (Cigna et al., 2015) and the FACSHuman generated pictures. The overall percentage of categorization rate was high and related to those found in other database validation studies (Goeleven et al., 2008; Langner et al., 2010). The computer modeling of experimental research material, via the use of FACSHuman, offers possibilities in relation to the creation of a photographic database of directed actors for emotional facial expressions.

Experiment 2

The FACS describes facial expression by the use of codification of Facial Action Units (AUs). The FACS manual is a tool to train people to recognize facial movements that involve one or more AUs. In this experiment, an evaluation of the accuracy of unique facial movements identified as AUs was conducted by non-FACS coders in order to compare them with those described and found in the FACS manual.

The material was produced with the FACSHuman software program and compared with the one found and described in the FACS manual (Ekman et al., 2002).

Method

Participants

Twenty-two women (Mage = 39.09, SDage = 11.71) and 28 men (Mage = 47.37, SDage = 11.37) were engaged in this experiment via the LinkedIn professional network. They were recruited by the internal messaging system. The participants validated, via a checkbox, a form giving free, informed, and express consent before the experiment. The experiment could not begin without validation of the form. It took place on an Internet browser on the participant’s computer and respected people’s anonymity.

Materials

The experimental material consisted of 26 photos from the FACS reference manual (Ekman et al., 2002) and 26 pictures generated with FACSHuman software program. These photos represented the same Caucasian man from the FACS and the same modeled Caucasian man for the FACSHuman software program.

For both models, the 25 pictures represent one AU at once, plus a photograph of the face with a neutral expression. These last were added to test faces with no AUs in action, and to increase the validity of participants’ responses (Russell, 1993), one from the FACS manual, while the other was generated with the FACSHuman software program. A total of 52 trials were presented to the participants. Twenty-six were congruent, where both models expressed the same AUs, and twenty-six were non-congruent, where the two faces expressed different AUs. Appendix A4 presents the 26 AUs alone for the whole picture presented.

The following experiment was constructed using the JSPsych Experiment Creation Framework (de Leeuw, 2015).

Procedure

First, a brief presentation of the theme of the experiment and the researcher was made. Then came the presentation of the experiment and its average duration. Following this presentation, participants were instructed to sign the informed consent form. All participants were volunteers and were free to withdraw at any stage of the study. No remuneration was given. Written informed consent was signed by the participants.

The training session and main experiment consisted in a comparison task of two facial expression images displayed side by side, one from the FACS manual and one created with FACSHuman software. Participants had to complete the sentence “The expressions of these two people are” by choosing the word “Different” or “Identical”. Images exposed only one Action Unit at a time. They were presented with an expressive intensity of 100% according to the FACS manual references. The coupled images were exposed in random order. They were displayed until the participant selected one word. The choice of a word led to the continuation of the experiment. There was no time limitation for completing the experiment. The training sessions consisted of one congruent and one incongruent trial. For the experimental parts, 26 congruent and 26 non-congruent trials were displayed in random order. Each pair was seen only once.

During the training and the experiment part, transition from one instruction screen to another was effected by pressing a “next” button displayed on the screen. At the end of the experiment, demographic information was collected. This concerned gender and age. The data was processed statistically using a signal discrimination procedure. The results were subjected to an analysis with regard to the performance measurement procedures for same-different experimental protocol as described by Macmillan and Creelman (2005).

Results

For all stimuli, participants had a discrimination rate of 82.44% (SD = 0.09) (Appendix Table 14). As for the previous experiment, we tested gender differences. There was no difference between men (M = 82.41, SD = 9.02) and women (M = 82.49, SD = 8.83) in the results obtained for the task (Welch two-sample t test, p = 0.974, d = 0.01, power level = .99).

The least recognized AUs were lip presser (AU 24, 38%) and nasolabial furrow deepener (AU 11, 46%). The most recognized AUs were Eye closure (AU 43, 98%) and Upper lid raiser (AU 5, 96%) (Table 7).

Table 7 Percentage of good response by action units for congruent and non-congruent trials

The results showed a good discrimination of the stimuli presented to the participants (A’ = 0.90) as well as a good sensitivity (d’ = 1.97). However, participants presented a light conservative response bias (C = 0.30). Participants were more inclined to respond “different” to the question asked.

Discussion

In this experiment, the accuracy of the most used AUs was evaluated by non-FACS coders. They were engaged in a same-different task to examine the accuracy of FACSHuman generated AUs expressed one at a time compared with those found in the FACS manual.

Results showed a good discrimination rate compared to previous studies (Sayette et al., 2001). Our results showed that an accuracy of 82.44% was consistent with previous findings that reported an average recall of 81% for Amini et al. (2015) and 90% interraters’ reliability for Krumhuber et al. (2012) on single AUs by FACS coders expressed at their highest intensity levels. The low discrimination rate of AU 24 (lip presser) and AU 11 (Nasolabial Furrow Deepener) could be explained by the nature of the stimuli, which were pictures rather than videos. However, as pointed out in the FACS manual, also by the small facial movements these two AUs involved the face.

These two AUs are very small movements of the face contrary to AU 43 (Eye closure) and AU 5 (Upper lid raiser), which are movements of great amplitude. The results obtained in the first and second experiment by non-professional FACS coders were good. They provided a good evaluation of the accuracy of the experimental material produced with the FACSHuman software program in comparison to the FACS.

In the next two experiments, two certified FACS coders were recruited to carry out an evaluation of the accuracy of AUs alone and AU combinations. The two previous studies were conducted to evaluate the likelihood of combinations and AUs alone available in the software by non-coders. These two additional experiments were conducted to provide a technical evaluation of the accuracy of AU movements produced by the software in comparison to the FACS.

Experiment 3

In the previous experiments, evaluation of emotional facial expressions (Experiment 1) and AUs alone (Experiment 2) was conducted by non-FACS coders. In order to be used as experimental material, images and videos produced by FACSHuman need to be as compliant as possible with the FACS. This compliance will facilitate the communication and the description of the facial configurations used as research stimuli.

In this experiment, we used a similar protocol described in the Amini et al., 2015 article and the materials produced with the FACSHuman software program submitted to certified FACS coders. However, we did not extend the protocol by the use of different intensities; we only used 100% intensity as described in the FACS manual. This consisted of the encoding by FACS coders of a series of pictures and videos generated with FACSHuman showing only one AU at a time. AU combinations were evaluated in the next experiment.

Method

Participants

Two men participated in this study. They were recruited via an advertisement on the LinkedIn professional network.

The first coder was 39 years old and had 4 years’ experience in FACS coding. The second coder was 30 years old and had 5 years’ experience in FACS coding. The second coder had had regular facial coding practice using the FACS methodology as part of his doctoral research in nonverbal communication. Participants validated a consent statement prior to the start of the experiment.

Materials

The experimental material consisted of 47 images and 47 videos (see Appendix A5) of the action units described in the FACS manual (Ekman et al., 2002) expressed by a Caucasian man. The tested AUs were the 26 AUs used in the previous experiment with the addition of 21 units. These last AUs were the miscellaneous actions and supplementary codes. These are extended AUs or movement descriptors described in the FACS manual (Ekman et al., 2002) and mostly involved in the FACS coding procedure.

Procedure

For each participant, images of facial expressions that represent the AUs presented with an expressive intensity of 100% were displayed on the screen one at a time. Participants could freely select the different images to encode by selecting them with tabs at the bottom of the screen, which were randomly distributed.

For each image, an image of the face with a neutral expression was presented side by side on the right of the screen with the image that the coder had to analyze. A video of the action unit was also available for the coder to the right of the neutral one. Videos were playable as many times as needed. Participants were asked to code the different images and videos presented on the screen according to the notation system defined in the FACS (Ekman et al., 2002). They entered their FACS code in a text box below the images. There was no time limitation and participants could perform the coding task freely in multiple sessions as described in the FACS coding procedure.

Results

The results of the encoding procedure for the AUs were analyzed using the 𝛼-Agreement coding procedure of Krippendorff (2004, 2011). The agreement coefficient computed was moderate (𝛼 = 0.59). The first coder obtained a recognition score of 63.83% and the second coder a score of 85.11% (Appendix Table 15).

Discussion

This experiment was conducted to evaluate the compliance of AUs, displayed by a model created with FACSHuman, with the FACS. This evaluation of images and videos was performed by an encoding task conducted by two certified FACS coders. The weaker result produced by the first coder, as compared with the second one, can be explained by their level of regular practice, which was lower than that of the second coder. AU 6 (Cheek Raise), 13 (Sharp Lip Puller), and 25 (Lips Part) were not coded correctly by either coder. For AU 6 this detail was discussed with the coders. It was due to the lack of precision of the wrinkles at the corners of the eyes, which is a criterion for coding this AU. This detail was corrected in the software after data analysis of Experiments 3 and 4. AU 13 was coded as 12, which is the most similar movement. Number 25 is involved in different facial configurations and can be coded differently in accordance with its intensity. The FACS certification requires natural and spontaneous facial expressions to be coded into movements. In the next experiment, the two FACS coders performed a codification task of complex facial expressions with multiple AUs generated with the FACSHuman software program.

Experiment 4

Complex facial expressions are composed of the activity of multiple AUs at one time. In this experiment, certified FACS coders were asked to code 54 facial configurations described in the FACS manual. The aim of this experiment was to evaluate the accuracy of complex facial expression images produced with the FACSHuman software program.

Method

Participants

Participants were the same as in the previous experiment.

Materials

Experimental material consisted of 54 combinations of AUs (Appendix Table 13). The interface was the same as the one described in the previous experiment. The computer screen showed the face of a Caucasian man. Participants had at their disposal a neutral image of the face as well as an image of the combination and the corresponding video corresponding. It displayed in succession the coding instructions and the different combinations of AUs to be coded in a randomized and anonymous sequence. Coders always had a reminder sheet containing all the codes of the FACS repository (Ekman et al., 2002) as described in the FACS encoding procedure.

Procedure

In this experiment, the interface and protocol were the same as the one described in the previous experiment. The participants were asked to code each expression presented on the screen with an expressive intensity of 100%. The results were reported below the images using the codification system.

Statistical analyses

The scoring procedure for calculation of the agreement index was the one described and used in Ekman et al. (1972) from Wexler (1972) to analyze the results obtained in the coding sessions.

This index is computed with the following formula:

$$ \mathrm{Index}\ \mathrm{of}\ \mathrm{agreement}=\frac{\left( number\ of\ AUs\ on\ which\ coder\ 1\ and\ 2\ agreed\right)\ x\ 2}{total\ number\ of\ AUs\ scored\ by\ the\ two\ coders} $$

Results and discussion

In this experiment, we tested the accuracy of AU combinations with two certified FACS coders. A result of .68 constitutes a satisfactory intercoder agreement (Ekman et al., 1972). The recognition averages for the AUs were 68.08% for the first coder and 81.56% for the second coder (Appendix Table 16). As for the previous experiment, there was a difference in recognition between the two coders.

General discussion

This article describes and presents the evaluation of the FACSHuman software program. To carry out this evaluation, we conducted four experiments to assess the accuracy and the compliance of facial expressions and AUs with the FACS. In the first and second experiments, lay persons had to evaluate emotional expressions as well as unique AUs produced with FACSHuman. The first experiment was a categorization task of six basic emotional facial expressions. The second was a same-different comparison task of AUs expressed one at a time between the one found in the FACS manual and the one produced by FACSHuman. In the third and fourth experiments, we investigated the recognition of single AUs and AU combinations with certified FACS coders. FACS coders were trained to identify combinations and unique AUs as well as some EmFACS emotional configurations. However, lay people do not have such skills. Thus, we decided to present lay people with AUs alone for some participants, and with emotional facial expressions for other participants. These two groups of participants allowed us to increase the evaluation panel and the ecological validity.

The first experiment reported an 85.5% recognition rate by participants for emotional expressions. The second experiment showed a good accuracy of AUs only presented to the lay participants and related to the one found in the FACS manual. In the third and fourth experiments, 47 AUs and 54 AU combinations were evaluated by two certified FACS coders. Results showed moderate-to-satisfactory intercoder agreement.

Nevertheless, our results show that this software does not dispense with performing an evaluation process for the sets of images or videos produced. Indeed, as for the image banks previously produced by different research laboratories, the perception of emotional facial expressions presents a minimal variance that cannot be predicted by the aesthetic and morphological criteria chosen during the creation of the avatars. However, FACSHuman frees the researcher from the selection and the direction of actors. Evaluation can now be done quickly via research application environments made available to participants on the Internet. With this way of evaluation, the experimenter is free to choose groups and segments by age, gender, and social groups ... to which they wish to submit this evaluation. This ensures greater calibration for the image set produced for the population targeted by the experiment. The results obtained, in a categorization task, showed that the images from the available data banks offer a recognition rate inferior or comparable to that of digitally created images.

Limits of the present work

In the first experiment, we did not use animation as compared to some of the other validation protocols. Further validations, such as levels of believability, realism, or intensity, may be required to evaluate facial animations. Another topic could be the production and evaluation of micro-expressions for which users could control the speed and intensity of the action units used in the created facial expression. A comparative study, conducted by Krumhuber et al. (2017), described the key dimensions and properties of dynamic facial expression datasets available for researchers. The authors highlighted the need for an evaluative study of the material used such as categorization or judgement tasks and the accuracy of the conveyed emotions. It also pointed out the use of FACS as a tool “of considerable value”, as it can be used to compare observed facial expressions with reported emotions or physiological responses. Indeed, the different morphological configurations, skin tints, and spatial configurations of the elements characterizing a human face are likely to induce emotional interpretation of facial expression (Palermo & Coltheart, 2004). Reciprocally, emotions can affect gender categorization (Roesch et al., 2010) as well as social appraisal (Mumenthaler & Sander, 2012). This effect is found in the results obtained in the first two studies. The different avatars were not evaluated in the same way by the participants although they met the same characteristics of neutrality. These elements must then be considered as variables of significant confusion when creating an experiment. The equipment must be calibrated before being used in an experiment. The results obtained can be used as comparators (expected results).