Skip to main content
Log in

Modeling human–human interaction with attention-based high-order GCN for trajectory prediction

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

This paper presents a novel high-order graph convolutional network (GCN) for pedestrian trajectory prediction. Specifically, the walking state of a target pedestrian depends on both its historical trajectory, which encodes its speed, walking direction and acceleration information, as well as the movement of its neighbors. Thus we propose to leverage GCNs to aggregate the trajectory features of the target pedestrian and its neighbors to predict the movement of the target pedestrian. Considering that the movement of the neighbors’ neighbors affects the movement of the target pedestrian’s neighbors, thus indirectly affecting the movement of the target pedestrian, we propose to use a high-order GCN for human–human interaction modelling. Such a high-order GCN considers the target pedestrian’s neighbors as well as its neighbors’ neighbors. Further, a pedestrian avoids collision with others by estimating its locations and its neighbors’ upcoming locations, and it slows down or changes direction if it believes a collision may occur, especially in very crowded scenes. In light of this, we propose to model such anticipation-based decision making behavior as attention and combine it with our high-order GCN. Thus we first roughly estimate the future trajectories of all pedestrians with a simple method. By using the coarse predicted future trajectory and GCN outputs, we calculate the attention in our attention-based high-order GCN and predict future trajectory. Extensive experiments validate the effectiveness of our approach. In addition, our model shows a higher data efficiency. On the ETH&UCY dataset, using only 5\(\%\) of the training data for each training epoch, our model outperforms the state of the art.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. \(f_{\text {LSTM}}\) and \(g_\text {{LSTM}}\) share the same parameters in the experiment.

  2. Through experiments, we find that the performance does not increase when the depth of the GCN goes deeper, meanwhile the computational costs increase significantly when the depth goes deeper. The reason is that more layers make each pedestrian is affected by pedestrians in a very long way, which is not always the truth. Thus we set the depth of GCN to 1.

  3. It is possible that some second-order neighbors may be behind the target, but the percentage of them is small. In other words, only a few people behind the target affect the trajectory of the target. Our experiments also show that the performance based on second-order neighbors is better than that based on first-order neighbors, and the first-order neighbors method is better than that based on neighbors from all directions. Please refer to Table 5.

  4. Since the original ETH&UCY datasets do not have a unified data format, we use the raw data provided in SGAN [9] as the original data.

  5. All of these comparisons are based on Table 5 average(AVG).

References

  1. Abu-El-Haija, S., Perozzi, B., Kapoor, A., Harutyunyan, H., Alipourfard, N., Lerman, K., Steeg, G.V., Galstyan, A.: Mixhop: Higher-order graph convolution architectures via sparsified neighborhood mixing. arXiv preprint arXiv:1905.00067 (2019)

  2. Alahi, A., Goel, K., Ramanathan, V., Robicquet, A., Fei-Fei, L., Savarese, S.: Social lstm: Human trajectory prediction in crowded spaces. In: CVPR, pp. 961–971 (2016)

  3. Bruna, J., Zaremba, W., Szlam, A., LeCun, Y.: Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203 (2013)

  4. Cancela, B., Iglesias, A., Ortega, M., Penedo, M.G.: Unsupervised trajectory modelling using temporal information via minimal paths. In: CVPR, pp. 2553–2560 (2014)

  5. Dong, H., Zhou, M., Wang, Q., Yang, X., Wang, F.: State-of-the-art pedestrian and evacuation dynamics. IEEE Trans. Intell. Transp. Syst. 21(5), 1849–1866 (2020)

    Article  Google Scholar 

  6. Emonet, R., Varadarajan, J., Odobez, J.M.: Extracting and locating temporal motifs in video scenes using a hierarchical non parametric bayesian model. In: CVPR, pp. 3233–3240. IEEE (2011)

  7. Fernández-Ramírez, J., Álvarez Meza, A., Pereira, E.M., Orozco-Gutiérrez, A., Castellanos-Dominguez, G.: Video-based social behavior recognition based on kernel relevance analysis. Vis. Comput. 36(8), 1535–1547 (2020)

    Article  Google Scholar 

  8. Gori, M., Monfardini, G., Scarselli, F.: A new model for learning in graph domains. In: IJCNN, vol. 2, pp. 729–734. IEEE (2005)

  9. Gupta, A., Johnson, J., Fei-Fei, L., Savarese, S., Alahi, A.: Social gan: Socially acceptable trajectories with generative adversarial networks. In: CVPR, pp. 2255–2264 (2018)

  10. Hamilton, W., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. In: NIPS, pp. 1024–1034 (2017)

  11. Helbing, D., Molnar, P.: Social force model for pedestrian dynamics. Phys. Rev. 51(5), 4282 (1995)

    Google Scholar 

  12. Huang, Y., Bi, H., Li, Z., Mao, T., Wang, Z.: Stgat: Modeling spatial-temporal interactions for human trajectory prediction. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 6272–6281 (2019)

  13. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)

  14. Kumar, D., Bezdek, J.C., Rajasegarar, S., Leckie, C., Palaniswami, M.: A visual-numeric approach to clustering and anomaly detection for trajectory data. Vis. Comput. 33(3), 265–281 (2017)

    Article  Google Scholar 

  15. Lee, J.B., Rossi, R.A., Kong, X., Kim, S., Koh, E., Rao, A.: Higher-order graph convolutional networks. arXiv preprint arXiv:1809.07697 (2018)

  16. Lerner, A., Chrysanthou, Y., Lischinski, D.: Crowds by example. In: Computer Graphics Forum, vol. 26, pp. 655–664. Wiley Online Library (2007)

  17. Li, J., Ma, H., Zhang, Z., Tomizuka, M.: Social-wagdat: Interaction-aware trajectory prediction via wasserstein graph double-attention network (2020). https://doi.org/10.13140/RG.2.2.25253.04320

  18. Li, M., Chen, S., Chen, X., Zhang, Y., Wang, Y., Tian, Q.: Actional-structural graph convolutional networks for skeleton-based action recognition. In: CVPR, pp. 3595–3603 (2019)

  19. Li, Y.: Which way are you going? imitative decision learning for path forecasting in dynamic scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 294–303 (2019)

  20. Liang, J., Jiang, L., Niebles, J.C., Hauptmann, A.G., Fei-Fei, L.: Peeking into the future: Predicting future person activities and locations in videos. In: CVPR, pp. 5725–5734 (2019)

  21. Luber, M., Stork, J.A., Tipaldi, G.D., Arras, K.O.: People tracking with human motion predictions from social forces. In: ICRA, pp. 464–469. IEEE (2010)

  22. Mehran, R., Oyama, A., Shah, M.: Abnormal crowd behavior detection using social force model. In: CVPR, pp. 935–942. IEEE (2009)

  23. Mohamed, A., Qian, K., Elhoseiny, M., Claudel, C.: Social-stgcnn: A social spatio-temporal graph convolutional neural network for human trajectory prediction. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14412–14420 (2020)

  24. Pellegrini, S., Ess, A., Schindler, K., Van Gool, L.: You’ll never walk alone: Modeling social behavior for multi-target tracking. In: ICCV, pp. 261–268. IEEE (2009)

  25. Pellegrini, S., Ess, A., Van Gool, L.: Improving data association by joint modeling of pedestrian trajectories and groupings. In: ECCV, pp. 452–465. Springer (2010)

  26. Rudenko, A., Palmieri, L., Herman, M., Kitani, K.M., Gavrila, D.M., Arras, K.O.: Human motion trajectory prediction: A survey. arXiv preprint arXiv:1905.06113 (2019)

  27. Sadeghian, A., Kosaraju, V., Sadeghian, A., Hirose, N., Rezatofighi, H., Savarese, S.: Sophie: An attentive gan for predicting paths compliant to social and physical constraints. In: CVPR, pp. 1349–1358 (2019)

  28. Schöller, C., Aravantinos, V., Lay, F., Knoll, A.: What the constant velocity model can teach us about pedestrian motion prediction. IEEE Robot. Autom. Lett. 5(2), 1696–1703 (2020)

    Article  Google Scholar 

  29. Si, C., Chen, W., Wang, W., Wang, L., Tan, T.: An attention enhanced graph convolutional lstm network for skeleton-based action recognition. In: CVPR, pp. 1227–1236 (2019)

  30. Su, H., Zhu, J., Dong, Y., Zhang, B.: Forecast the plausible paths in crowd scenes. In: IJCAI, vol. 1, p. 2 (2017)

  31. Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., Bengio, Y.: Graph attention networks. arXiv preprint arXiv:1710.10903 (2017)

  32. Vemula, A., Muelling, K., Oh, J.: Social attention: Modeling attention in human crowds. In: ICRA, pp. 1–7. IEEE (2018)

  33. Xu, K., Hu, W., Leskovec, J., Jegelka, S.: How powerful are graph neural networks? arXiv preprint arXiv:1810.00826 (2018)

  34. Xu, Y., Piao, Z., Gao, S.: Encoding crowd interaction with deep neural network for pedestrian trajectory prediction. In: CVPR, pp. 5275–5284 (2018)

  35. Yagi, T., Mangalam, K., Yonetani, R., Sato, Y.: Future person localization in first-person videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7593–7602 (2018)

  36. Yao, Y., Xu, M., Choi, C., Crandall, D.J., Atkins, E.M., Dariush, B.: Egocentric vision-based future vehicle localization for intelligent driving assistance systems. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 9711–9717. IEEE (2019)

  37. Yi, S., Li, H., Wang, X.: Understanding pedestrian behaviors from stationary crowd groups. In: CVPR, pp. 3488–3496 (2015)

  38. Yi, S., Li, H., Wang, X.: Pedestrian behavior modeling from stationary crowds with applications to intelligent surveillance. TIP 25(9), 4354–4368 (2016)

    MathSciNet  MATH  Google Scholar 

  39. Yi, S., Li, H., Wang, X.: Pedestrian behavior understanding and prediction with deep neural networks. In: ECCV, pp. 263–279. Springer (2016)

  40. Zhang, L., She, Q., Guo, P.: Stochastic trajectory prediction with social graph network. arXiv preprint arXiv:190710233Z (2019)

  41. Zhang, P., Ouyang, W., Zhang, P., Xue, J., Zheng, N.: Sr-lstm: State refinement for lstm towards pedestrian trajectory prediction. In: CVPR, pp. 12085–12094 (2019)

  42. Zhou, B., Tang, X., Wang, X.: Learning collective crowd behaviors with dynamic pedestrian-agents. IJCV 111(1), 50–68 (2015)

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Key Research and Development Program of China (No. 213), the Shanghai Municipal Natural Science Foundation (No. 19ZR1404700), Fudan-Zhuhai Innovation Institute, and Fudan University-CIOMP Joint Fund (No. FC2019-003).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bo Hu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fang, Y., Jin, Z., Cui, Z. et al. Modeling human–human interaction with attention-based high-order GCN for trajectory prediction. Vis Comput 38, 2257–2269 (2022). https://doi.org/10.1007/s00371-021-02109-2

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-021-02109-2

Keywords

Navigation