1 Introduction

Viewing 3D contents using a head-mounted display (HMD) has been known to cause out-of-view problems (Gruenefeld et al. 2017a), which impair users’ spatial cognition of the surrounding objects. For 2D contents using a flat display, users can easily view the necessary information because almost all events will be presented in the field of view. However, for 3D contents, visual events (e.g., computer graphics and real-life movies recorded by a 360-degree camera) will be presented omnidirectionally (e.g., the left, right, and back directions of users). In this case, users would have trouble recognizing necessary information. Given that the effective visual angle of HMD is smaller than the natural viewing angle (e.g., 110 degree in HTC VIVE Pro vs. 180 degree in the natural field of view), the out-of-view problem is a potential issue for developing accessible contents of virtual reality (VR).

To resolve the out-of-view problem, previous works have developed various guidance designs that visualize the location or direction of items presented in the users’ surroundings. One guidance design is an extension of the visual field in which target items are transformed by projection into augmented items in the field of view. There have been various visual extension designs such as 3D radar (Gruenefeld et al. 2019), EyeSee360 (Gruenefeld et al. 2017b), fisheye lens (Orlosky et al. 2014), mirror ball (Bork et al. 2018), panorama vision (Ardouin et al. 2012), and spider vision (Fan et al. 2014). For instance, the 3D radar transforms target items surrounding users into a small map and superimposes it on the field of view. The other visual guidance technique is the attentional guidance, in which users’ attention is guided to target items by augmented items such as FlyingARrow (Gruenefeld et al. 2018b), 3D Halo/Wedge (Gruenefeld et al. 2018a), and SWAVE (Renner and Pfeiffer 2017). For instance, FlyingARrow appears in the field of view and moves to the location of the target items. The attentional guidance is based on the cognitive mechanism of automatic capture of attention (Jonides and Yantis 1988; Pratt et al. 2010).

These guidance designs have general effects that enhance the spatial cognition of users’ surroundings. The guidance effect has been evaluated by the time required to detect target items in a search task (search times: Bork et al. 2018; Gruenefeld et al. 2018a). On this point, the 3D radar may enhance the accuracy of target localization because it requires users to switch their visual perspectives between the ego-centric and bird’s views to locate the items transformed to a 360-degree environment. This switch has been reported to enhance the spatial cognition of users’ surroundings (Gorisse et al. 2017). Another important criterion to evaluate the guidance effect is the cognitive load. The cognitive load is the extent of the effortfulness of the processes recognizing guidance designs (e.g., reading images and locating target items). A measure of cognitive load is the time required to recognize the details of the guidance design (recognition time: Chen et al. 2018) because it delays human responses (Levy and Pashler 2001). Given this, the 3D radar may delay locating target items in a 360-degree environment because the perspective switch increases the cognitive loads on users (Friedman et al. 2008). Users are frequently required to respond to any targets in a 360-degree environment (e.g., the detection of enemy in a VR game), and in such situations, guidance designs that produce a small cognitive load are preferred.

However, two issues remain unresolved: (a) how are the guidance effects different with each guided direction? (b) how much cognitive load is required by the guidance? Regarding the first issue, no study has evaluated the guidance effect for omnidirectional surroundings. However, it is important because human attention is biased among spatial directions. For example, humans tend to pay attention to the leftward (pseudo-neglect phenomenon: Jewell and McCourt, 2000; Zago et al. 2017). Moreover, it is more difficult to pay attention to the surroundings slightly over the head and back of users (Harada and Ohyama 2019). Regarding the second issue, few studies have quantitatively measured cognitive loads for guidance in a 360-degree environment. In particular, it has remained unclear how guidance designs delay the localization of target items. The two issues are related to the design of accessible VR contents. For example, VR game contents require users to detect important items quickly (e.g., bullets and enemies in a battle scene) and to localize information accurately (e.g., a destination and facilities in a moving scene). In the former case, users would require guidance that produces a small cognitive load, whereas in the latter case, they would need guidance that is accurate for each direction.

This study aimed to evaluate the effect of guidance designs on spatial cognition in a 360-degree environment with respect to the search time of each direction and recognition time. To examine these issues, we conducted a visual search, a task frequently used in cognitive science for investigating attentional mechanisms with respect to behavioral (Finlayson and Grove 2015; Huang and Pasher 2005) and brain perspectives (Bichot et al. 2005; Leonards et al. 2000). In this task, the participants were instructed to utilize a guidance design to search for a target presented on an HMD. The guidance design and direction of a target were manipulated to evaluate the guidance effects of each direction. We mainly examined two points. One was how the search times of each target direction were different between guidance designs. The search times were also used to create criteria maps in which guidance effects for directions were visualized as heat maps. The other point was how the times required to recognize guidance designs (i.e., cognitive loads) were different between guidance designs.

2 Experimental methods

2.1 Participants

Thirty students from the University of Tsukuba (15 men and 15 women) aged 19–24 (M = 21.70, SD = 1.51) participated in the experiment. All participants reported normal or corrected to normal visual acuity and were naïve as to the purpose of the experiment. A post hoc power analysis with G*Power showed that the power was 0.754, whose value was comparable to the criterion.8 (Cohen 1992).

2.2 Materials

Our systems were mostly based on Harada and Ohyama (2019, 2020) (Fig. 1). Virtual images were presented on an HMD equipped with an eye-tracking system (Tobii Pro VR Integration on HTC VIVE) by a laptop personal computer (PC) (DELL Alienware R4). The GPU was GeForce GTX1080 (NVIDIA). Participants’ responses were received by two controllers (HTC VIVE). Unity (2018.4.13f1) and SteamVR (version 1.10.32) were employed to control the presentation of the visual images and record data. The HMD and controllers were tracked by two sensors (HTC Base station 1.0). The cable connecting the PC and HMD was hung on the ceiling.

Fig. 1
figure 1

Experimental settings. This setting is based on Harada and Ohyama (2019, 2020), excluding Unity

The target was a white or black “ + ” (2.86° × 2.86° of visual angle), with the direction defined as a combination of latitude and longitude. The spatial interval between the directions of the target was 45°. The distractor was a white or black “T” (2.86° × 1.43° of visual angle) that was rotated 0°, 90°, 180°, or 270°. The spatial interval between distractors was 22.5°, except for latitudes 67.5° and -67.5° (the longitudinal interval was 45°). The distractor was not presented at the direction of the target or guidance design. The directions of the target and distractors are shown in Fig. 2.

Fig. 2
figure 2

Directions of targets and distractor stimuli. The target direction was randomly selected in one of the 25 directions. The distractor was not presented in the directions of the target or guidance (UD0 × LR0). Distractors were randomly selected from eight types in each trial

Before the evaluations, we classified existing designs of visual guidance into three categories. The first category was the moving design in which a salient item appears in the user’s visual field and moves toward the direction of the target (e.g., FlyingARrow). This moving design relies on the attentional capturing mechanism (Jonides and Yantis 1988; Pratt et al. 2010) in which attention is automatically captured by exogenous stimuli. The second category is the orientation design in which the location of the target is shown by a decentering (e.g., halo, wedge, and SWAVE) or pointing item (e.g., pointing arrow). This orientation design relies on the attention-context coordination in which attention is allocated based on contextual cues, such as the directions of an arrow (Ristic and Kingstone 2012) and another person’s gaze (Friesen et al. 2004). The third category was the visual extension, discussed in the introduction.

Five guidance designs were evaluated (Fig. 3). From the moving design category, the moving window was selected. This was a red circle moved from the central position (UD0 × LR0) to the target direction. The 3D arrow, radiation, and spherical gradation were selected from the orientation design category. The 3D arrow was presented at the central position and linearly pointed toward the target direction. The radiation comprised eight lines that were focused on the target direction. The spherical gradation was shaped by a sphere that contained the participants. The color of the gradation was distributed from black to white, and the whitest area showed the target direction. From the visual extension design, the 3D radar was selected. This comprised a transparent sphere, small black dot, and small red dot. The sphere was a schematic field covering 360°; the black dot represented the participant position, and the red dot represented the target direction.

Fig. 3
figure 3

Five guidance designs. These designs show the direction of a target by different functions

2.3 Procedure

For each participant, this experiment (250 trials) was conducted in a quiet room within approximately 90 min, including rest times. After obtaining informed consents, participants received instructions for the experiment and were asked to wear the HMD and hold a controller in each hand.

The sequence of the experimental trial was as follows (Fig. 4). After the start button was pressed, a fixation cross was presented for 500 ms. Subsequently, the guidance design, the target, and several distractors were simultaneously presented. The presentation time of the guidance design was restricted (500 ms). The target and distractors lasted until the response button was pressed or 25 s had passed, and the target direction was randomly selected from one of the 25 directions. Each participant was to search for the target and report the color of the target as accurately and quickly as possible by pressing the button.

Fig. 4
figure 4

The trial sequence. Visual guidance designs were manipulated between experimental blocks while target directions were manipulated within the block

We divided the experimental session into five blocks, and the total number of trials was 250 [visual guidance designs (5) × target directions (25) × repetitions (2)]. The guidance design was manipulated between different blocks, and the target direction was manipulated within one block. Additionally, the order of guidance design was counterbalanced between participants.

2.4 Data analysis

Search times and recognition times were measured as guidance effects. The search times were calculated by the duration from the onset of a guidance design to the target detection (i.e., the button press). The recognition times were calculated by the duration from the onset of a guidance design to the start of a search (when gazes were moved out of the 20° diameter circle around the fixation cross). The search time includes all the durations of eye movements during the trial, whereas the recognition time includes it before the target search. Therefore, the search time includes both recognition time and remaining time. The search time was converted into heat maps to visualize guidance effect for directions. This was performed by the Kriging method (Yang et al. 2004) with Surfer (Golden Software).

As another measure of guidance effects, eye movements were obtained from the data that combined left and right eyes. The eye movements were analyzed from the perspectives of fixations and saccades. Because the usage of eye-tracking for immersive conditions has several problems (Clay et al. 2019) such as a focus-accommodation-conflict (Hoffman et al. 2008) and the drift of calibration (Tripathi and Guenter 2016); it may be difficult to apply results obtained from such a situation into daily situations. Although further studies of the eye-tracking method in immersive VR are expected in general, we believe that the impact of these problems on our research is limited. This is because our purpose was to evaluate guidance designs used in VR environments, and this evaluation was conducted in the same condition, where the effect of the eye-tracking problem would equally occur under all guidance conditions. Fixation was defined as a gaze that dwelled for a minimum duration of 150 ms (as in Sitzmann et al. 2018) on a circular area spanning 2° of visual angle in diameter. The threshold of saccades was defined as a median velocity of each trial, and saccades were detected by Microsaccade Toolbox (Engbert et al. 2015). The algorithm of the tool box can be applied to detect not only microsaccades but also saccades (Mitsudo and Nakamizo 2010).

For statistical significance tests, the software R studio was used to conduct analysis of variances (ANOVAs: see Turner and Thayer, 2001 for basic information). Although ANOVAs require the normality of the data set, time data (e.g., search times and recognition times) can be followed by the ex-Gaussian distribution but not the Gaussian distribution (Dawson 1988). However, such non-normality of errors has been reported to little influence the results of ANOVAs (David and Johnson 1951; Kanji 1977; Schmider et al. 2010). Indeed, ANOVAs have been performed on time data in wide fields (Ganel and Goodale 2003; Greene et al. 2001; Hicks et al. 2004; Joy et al. 2021). Therefore, the usage of ANOVAs for time data will be general in the relevant literatures. Moreover, to avoid the effect of errors, ANOVAs were performed with the multiple-level model (“lmer” function in the “lmerTest” package was used). This model can control errors among participants, target colors, and order of guidance designs by entering their variables into random effects and examine the effects of dependent variables by entering them into fixed effects (see Baayen et al. 2008; Bate et al. 2015 for more details). In these analyses, we started with a model that included relevant factors and their interactions as fixed effects. The random intercept-only model was used, in which participants, target color, and the order of guidance designs were entered. Multiple comparisons were performed using the “lsmeans” function in the “lsmeans” package (p values were corrected using Tukey’s method).

As targets could not be presented at U45 × LR0 in 20 participants due to program errors, the data obtained from the location in these participants were excluded from the analysis. Moreover, the data obtained from incorrect responses, hasty responses (less than 100 ms), and outlier values of search times (mean + 2SD) were excluded based on previous studies (Franconeri and Simons 2003; Henderson and Macquistan 1993).

3 Results

The search times, eye movement data, and recognition times were analyzed with ANOVAs. These results are shown in Table 1, and their details are reported in the following sections.

Table 1 The results of ANOVA on each dependent variable. Degrees of freedoms (df), F values, and p values obtained ANOVA results. On the search times, a two-way ANOVA was performed with the fixed effects of guidance design, target direction, and their interaction. On the other five dependent variables, one-way ANOVAs were performed with the fixed effect of guidance design

3.1 Search time

Figure 5a illustrates the search times averaged across 30 participants, which mean the extent to which the design effectively guides participants to the direction of target items. To test for the effect of guidance designs and target directions, a two-way within-participants ANOVA was performed on the search times with fixed effects of guidance designs (5) and target directions (25). The test revealed significant main effects of guidance designs and target direction, and a significant two-way interaction (Table 1). A multiple comparison test revealed significant differences between guidance designs. In Fig. 5b, the colors show p values; redder indicates faster search times in the left guidance design than in the top design. These results indicate that the moving window and radiation improve the search for targets presented at a frontal area (i.e., from L45 to R45) and that the 3D arrow and radar improve the search for targets presented at the back of users (e.g., LR180). This suggests that the guidance effects change with the guided direction; the moving window and radiation are effective for guidance in the frontal direction, and the 3D arrow and radar are effective for guidance in the backward direction.

Fig. 5
figure 5

Effect of each guidance design on spatial directions. a Means of search times. The search times were averaged across 30 participants for each design and target direction. Error bars represent 95% confidence intervals. b Results of multiple comparison tests. The colors show p values; redder indicates faster search times in the left guidance design than in the top design. Asterisks indicate significant differences between the designs (p < .05)

In an attempt to visualize the guidance effects for spatial directions, the search times were converted to criteria maps. Figure 6 illustrates the criteria maps in which the color becomes red, yellow, green, and blue as the search time increases. The interval between the grids was 45°, and the cross points of the latitude and longitude grids show the directions of target appearance.

Fig. 6
figure 6

Criteria maps of visual guidance effect for spatial directions. These maps were created by interpolating the search times of targets. Redder areas indicate higher effectiveness, and bluer areas indicate lower effectiveness

3.2 Eye movement

Figure 7 shows the number of fixations, duration of a fixation, number of saccades, and length of the saccades. Effective guidance would decrease the number of fixations, fixation duration, and number of saccades and extend the length of saccades because these measures indicate the extent to which attention was captured by distractor items. To test for the effect of guidance designs, one-way within-participants ANOVAs were performed on the four dependent variables with a fixed effect of guidance designs (5). These tests consistently revealed significant main effects of guidance designs (Table 1). In particular, the moving window produced relatively few numbers of fixation and saccade and shorter duration of a fixation (see Fig. 7 for the details of significant differences and Supplementary Materials for their statistical values). This indicates that the moving window made users ignore the distractor items more strongly than the other guidance designs, suggesting effective guidance.

Fig. 7
figure 7

Eye movement data. Bars represent means; error bars represent 95% confidence intervals, and asterisks represent significant differences (p < .05)

3.3 Recognition time

Figure 8 illustrates the recognition times for the guidance design averaged across 30 participants, which means the amount of cognitive load caused by the guidance design. To test for the effect of guidance designs, a one-way within-participants ANOVA was performed on the recognition times with the fixed effect of guidance designs (5). The test showed a significant main effect (Table 1). A multiple comparison test revealed several significant differences (see Fig. 8 for the details and Supplementary Materials for their statistical values): in particular, the participants required short times to recognize the details of the moving window and radiation but long times for the 3D arrow, spherical gradation, and 3D radar. This suggests small cognitive loads for the two former designs and large cognitive loads for the three latter designs.

Fig. 8
figure 8

Mean recognition times of guidance design. Error bars represent 95% confidence intervals, and asterisks show significant differences (p < .05)

4 Discussion

To evaluate the guidance effect of each direction and cognitive load, the present study conducted a visual search task in a 360-degree environment. Notably, our results suggest that guidance effects are a trade-off among directions. Thus, this study provides empirical evidence for effectively designing the visual guidance in 360-degree context.

4.1 Evaluation for each guidance design

The moving window and the radiation were similar with respect to the guidance effects. The search times showed that the two designs most precisely guided attention toward the front of users. Moreover, the recognition times of the guidance designs showed that the two designs required a small cognitive load to utilize the guidance. This is consistent with previous findings, showing that the animated item automatically captured attention (Franconeri and Simons, 2003) and decentering designs promoted the localization of targets (Baudisch and Rosenholtz 2003). However, these two designs were less effective when targets were located behind participants (i.e., LR180), suggesting that the moving window and radiation are susceptible to the field of view in HMD. From this perspective, the moving window and radiation would precisely guide users to information located in front of them.

Compared to the moving window and radiation, the 3D radar was ineffective for guiding attention toward the front of participants but effective toward their back. The present findings are consistent with the idea that the users switch viewpoints between ego-centric and bird’s eye views to utilize 3D radar. This would promote spatial cognition of target items located in the out-of-view. This idea is further supported by the recognition times, showing that a larger cognitive load was required to utilize the 3D radar. Given that the switch between different perspectives is associated with the cognitive ability such as executive function (Friedman et al. 2008); the 3D radar requires large amounts of cognitive resources. As another potential reason, the cognitive loads may be due to the amount of information contained in the 3D radar. This design shows not only the target direction but also more information such as the location of users and surrounding areas. The large amount of details would delay the recognition of target direction because the set size of information on a visual field increases the cognitive loads (Palmer 1994; Wolfe 2010). Our results suggest that the 3D radar accurately guides users to necessary information irrespective of directions but delays the recognition of guidance.

Interestingly, the 3D arrow balanced out between the moving window/radiation and 3D radar. The search times suggest that the 3D arrow was (a) more effective than the 3D radar but less effective than the moving window/radiation for guidance toward the front of participants and (b) more effective than the moving window/radiation but less effective than the 3D radar for guidance toward their back. Moreover, the recognition times of guidance design in the 3D arrow were larger than in the moving window/radiation but smaller than in the 3D radar. A plausible explanation is that the 3D arrow may promote not only attentional guidance (Ristic and Kingstone 2012), but also perspective taking. If the 3D arrow is pointed at a target located in the out-of-view, it would be more difficult to recognize the target direction visually. In this case, users may take an object-based or allocentric viewpoint (Maringelli et al. 2001) to recognize the target direction. In other words, the 3D arrow may guide attention when targets are in the field of view and may have users take an allocentric viewpoint when targets are located out-of-view.

Unlike the four designs, the spherical gradation was entirely ineffective for 360-degree guidance. The number of fixations and saccades suggest that the spherical gradation requires users to search a large field of the 360-degree environment.

4.2 Practical application

The criteria maps and recognition time data suggest that the appropriate guidance design depends on the requirements and the context. In particular, the moving window and radiation are useful when quick responses are needed, and the 3D radar is useful when accurate guidance is needed. For example, users of social VR contents (e.g., VR chat, Rec Room, cluster.) can freely search for other users, communicate with them, and create virtual environments. When communicating among multiple people, users are required to switch quickly and dynamically paying attention toward the user that takes the turn of communication. In this way, VR contents users frequently require a dynamic response. The switch of attentional directions can be assisted by the moving window and radiation. Contrary to this, users may have difficulty searching for other users and necessary items due to the out-of-view problem. The difficulty would be reduced by the visual guidance of 3D radar. This assistance would be applied to VR contents of other fields such as VR game, training, and remote operation. For instance, users of action games would be required to detect the location of enemies, weapons, and escape routes as quickly as possible. The detection of necessary items would be quickly assisted by the moving window and radiation with small cognitive loads. Such a mixed use of different guidance designs would be applied to a wide range of fields that need visual assistance.

Additionally, our methodology would be applied to the evaluation of guidance designs in actual VR contents. Recent HMDs have an eye-tracking system (e.g., HTC VIVE Pro eye and NeU-VR 1.0), which can record users’ eye movements during the use of VR contents. These devices provide two types of eye data: (a) times in viewing a visual guidance design and (b) times in searching for the guided target. The data show the extent to which the guidance is accurate and quick in a certain content, which would contribute to the development of a guidance design in each VR content.

4.3 Limitations and next step

The significance of this study is that it clarifies the general effects of guidance design on the accuracy for each direction and cognitive loads. To control the effect of content types, we excluded the factor of contexts. In other words, one of the limitations of the present study is that it does not investigate the contextual effects on the guidance effects/cognitive loads. Actual VR contents have different contexts such as backgrounds, target items, and tasks. Given that users’ attention changes with contexts (Harada & Ohyama, 2020), the interaction between the guidance and contexts would be important to predict visual cognition. Therefore, the interaction needs to be investigated in future studies.

Another limitation is that the study does not investigate the effects of design parameters. The study selected five designs, but each guidance design can be illustrated by different physical features (e.g., color, size, and viewing distance). For example, the criteria maps suggest that spherical gradation was less accurate for the guidance than the other four designs. This may be related to the complexity of physical features in the spherical gradation. The spherical gradation is considered complex compared to the four designs, which would impair visual cognition because the amount of information influences cognitive resources (Mackworth 1965). In other words, the optimal modification of these physical parameters may enhance the guidance effects of spherical gradation. However, we did not examine this issue because it was not within the scope of this study. Therefore, future work is needed to investigate this issue while manipulating each physical feature parameter.

Additionally, there is a limitation on the long-term use. The effect of long-term use on the attentional guidance is important for application in professional situations. Especially for novice users, the use of guidance designs that produce small cognitive loads (e.g., moving window and radiation) would be useful because large cognitive loads limit mental capacities that are used by the working memory process to learn new skills (Paas et al. 2003). However, the long-term use may decrease the cognitive loads owing to perceptual learning (Goldstone 1998), in which the visual cognition of certain stimuli become automatic owing to repeated presentations. These suggest that, although the moving window and radiation are effective for guiding novice users, other guidance designs are also useful for long-term use. Although the present study helps explain the effect of cognitive loads on the long-term use, further studies are needed to evaluate the details of learning effects on guidance designs in long-term use.

5 Conclusion

In this study, we quantitatively evaluated the effect of visual guidance designs for each direction and cognitive loads. We found that (a) the guidance effect varied by the combination of designs and guided directions and that (b) cognitive loads are larger for the 3D radar and smaller for the moving window and radiation. We also developed the criteria maps of guidance effect for spatial directions, which sets a framework for assisting user cognition. These maps can be used in designing an accessible 3D user interface.