Skip to main content
Log in

Sensor fusion based manipulative action recognition

  • Published:
Autonomous Robots Aims and scope Submit manuscript

Abstract

Manipulative action recognition is one of the most important and challenging topic in the fields of image processing. In this paper, three kinds of sensor modules are used for motion, force and object information capture in the manipulative actions. Two fusion methods are proposed. Further, the recognition accuracy can be improved by using object as context. For the feature-level fusion method, significant features are chosen first. Then the Hidden Markov Models are built with these selected features to characterize the temporal sequence. For the decision-level fusion method, HMMs are built for each feature group. Then the decisions are fused. On top of these two fusion methods, the object/action context is modeled using Bayesian network. Assembly tasks are used for algorithm evaluation. The experimental results prove that the proposed approach is effective on manipulative action recognition task. The recognition accuracy of the decision-level, feature-level fusion methods and the Bayesian model are 72%, 80% and 90% respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

References

  • Ahmad, M., & Lee, S. W. (2006). Hmm-based human action recognition using multiview image sequences. In International conference on pattern recognition (pp. 263–266).

  • Aldoma, A., Marton, Z. C., Tombari, F., & Wohlkinger, W. (2012). Tutorial: Point cloud library—Three-dimensional object recognition and 6 dof pose estimation. Robotics & Automation Magazine IEEE, 19(3), 80–91.

    Article  Google Scholar 

  • Alhamzi, K., Elmogy, M., & Barakat, S. (2015). 3D object recognition based on local and global features using point cloud library. International Journal of Advancements in Computing Technology, 7, 43–54.

    Google Scholar 

  • Banos, O., Damas, M., Guillen, A., Herrera, L. J., Pomares, H., Rojas, I., Villalonga, C., & Lee, S. (2015). On the development of a real-time multi-sensor activity recognition system. In International work-conference on ambient assisted living. ICT-based solutions in real life situations (pp. 176–182).

  • Bux, A., Angelov, P., & Habib, Z. (2016). Vision based human activity recognition: A review. Berlin: Springer.

    Google Scholar 

  • Chen, C., Jafari, R., & Kehtarnavaz, N. (2017). A survey of depth and inertial sensor fusion for human action recognition. Multimedia Tools and Applications, 76(3), 4405–4425.

    Article  Google Scholar 

  • Chernbumroong, S., Shuang, C., & Yu, H. (2014). A practical multi-sensor activity recognition system for home-based care. Decision Support Systems, 66(C), 61–70.

    Article  Google Scholar 

  • Chu, V., Fitzgerald, T., & Thomaz, A. L. (2016). Learning object affordances by leveraging the combination of human-guidance and self-exploration. In ACM/IEEE international conference on human–robot interaction (pp. 221–228).

  • Diete, A., Sztyler, T., & Stuckenschmidt, H. (2017). A smart data annotation tool for multi-sensor activity recognition. In IEEE international conference on pervasive computing and communications workshops (pp. 111–116).

  • Dietterich, T. G. (2000). Ensemble methods in machine learning. In Multiple classifier systems, LBCS-1857 (pp. 1–15).

  • Du, Y., Wang, W., & Wang, L. (2015). Hierarchical recurrent neural network for skeleton based action recognition. In 2015 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1110–1118).

  • Feichtenhofer, C., Pinz, A., & Zisserman, A. (2016). Convolutional two-stream network fusion for video action recognition. In CoRR. arXiv:1604.06573.

  • Gu, Y., Do, H., & Sheng, W. (2012). Human gesture recognition through a kinect sensor. In IEEE international conference on robotics and biomimetics.

  • Gu, Y., Sheng, W., Liu, M., & Ou, Y. (2015). Fine manipulative action recognition through sensor fusion. In IEEE/RSJ international conference on intelligent robots and systems (pp. 886–891).

  • He, Z. (2010). A new feature fusion method for gesture recognition based on 3d accelerometer. In 2010 Chinese conference on pattern recognition (CCPR) (pp. 1–5).

  • Ke, Q., Bennamoun, M., Rahmani, H., An, S., Sohel, F., & Boussaid, F. (2020). Learning latent global network for skeleton-based action prediction. IEEE Transactions on Image Processing, 29, 959–970.

    Article  MathSciNet  Google Scholar 

  • Ke, Q., Fritz, M., & Schiele, B. (2019). Time-conditioned action anticipation in one shot. In 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 9917–9926).

  • Ke, Q., Liu, J., Bennamoun, M., Rahmani, H., An, S., Sohel, F., et al. (2019). Global regularizer and temporal-aware cross-entropy for skeleton-based early action recognition. In C. Jawahar, H. Li, G. Mori, & K. Schindler (Eds.), Computer vision—ACCV 2018 (pp. 729–745). Cham: Springer.

    Chapter  Google Scholar 

  • Kumar, S. H. & Sivaprakash, P. (2013). New approach for action recognition using motion based features. In Information and communication technologies (pp. 1247–1252).

  • Lara, O., & Labrador, M. (2013). A survey on human activity recognition using wearable sensors. IEEE Communications Surveys Tutorials, 15(3), 1192–1209.

    Article  Google Scholar 

  • Liu, J., Shahroudy, A., Wang, G., Duan, L., & Chichung, A. Kot, (2019). Skeleton-based online action prediction using scale selection network. In IEEE transactions on pattern analysis and machine intelligence (p. 1).

  • Liu, J., Shahroudy, A., Wang, G., Duan, L., & Kot, A. C. (2018a). SSNET: Scale selection network for online 3d action prediction. In 2018 IEEE/CVF conference on computer vision and pattern recognition (pp. 8349–8358).

  • Liu, J., Shahroudy, A., Xu, D., Kot, A. C., & Wang, G. (2018b). Skeleton-based action recognition using spatio-temporal lstm network with trust gates. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(12), 3007–3021.

    Article  Google Scholar 

  • Liu, J., Wang, G., Duan, L., Hu, P., & Kot, A. C. (2017). Skeleton based human action recognition with global context-aware attention LSTM networks. In CoRR. arXiv:1707.05740.

  • Meena, P. R., & Shantha, S. K. R. (2017). Spatial fuzzy c means and expectation maximization algorithms with bias correction for segmentation of mr brain images. Journal of Medical Systems, 41(1), 15.

    Article  Google Scholar 

  • Munaro, M., Rusu, R. B., & Menegatti, E. (2016). 3D robot perception with point cloud library. Robotics & Autonomous Systems, 78, 97–99.

    Article  Google Scholar 

  • Nag, A., & Mukhopadhyay, S. C. (2015). Occupancy detection at smart home using real-time dynamic thresholding of flexiforce sensor. IEEE Sensors Journal, 15(8), 4457–4463.

    Article  Google Scholar 

  • Pfister, A., West, A. M., Bronner, S., & Noah, J. A. (2014). Comparative abilities of microsoft kinect and vicon 3d motion capture for gait analysis. Journal of Medical Engineering and Technology, 38(5), 274–280.

    Article  Google Scholar 

  • Quigley, M., Conley, K., Gerkey, B. P., Faust, J., Foote, T., Leibs, J., Wheeler, R., & Ng, A. Y. (2009). ROS: An open-source robot operating system. In ICRA workshop on open source software.

  • Rahmani, H., Mian, A., & Shah, M. (2018). Learning a deep model for human action recognition from novel viewpoints. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(3), 667–681.

    Article  Google Scholar 

  • Shahroudy, A., Liu, J., Ng, T. T., & Wang, G. (2016). Ntu rgb+d: A large scale dataset for 3d human activity analysis. In Computer vision and pattern recognition (pp. 1010–1019).

  • Sharma, S., Kiros, R., & Salakhutdinov, R. (2015). Action recognition using visual attention. arXiv:1511.04119.

  • Simonyan, K., & Zisserman, A. (2014). Two-stream convolutional networks for action recognition in videos. In CoRR. arXiv:1406.2199.

  • Smisek, J., Jancosek, M., & Pajdla, T. (2013). 3D with kinect. Advances in Computer Vision & Pattern Recognition, 21(5), 1154–1160.

    Google Scholar 

  • Stiefmeier, T., Ogris, G., Junker, H., Lukowicz, P., & Troster, G. (2006). Combining motion sensors and ultrasonic hands tracking for continuous activity recognition in a maintenance scenario. In 10th IEEE international symposium on wearable computers(pp. 97–104).

  • Titus, J. A. (2012). The hands-on XBEE lab manual: experiments that teach you XBEE wirelesss communications (1st edn.). Newnes.

  • Tombari, F., Salti, S., & Di Stefano, L. (2010). Unique signature of histograms for local surface description. In Proceedings of the 11th European conference on computer vision (pp. 356–369).

  • Tombari, F., Salti, S., & Di Stefano, L. (2011). A combined texture-shape descriptor for enhanced 3D feature matching. In IEEE international conference on image processing (ICIP) (pp. 809–812).

  • Tran, K., Kakadiaris, I. A., & Shah, S. K. (2012). Fusion of human posture features for continuous action recognition. In Proceedings of the 11th European conference on trends and topics in computer vision—volume part I, ser. ECCV’10, 2012 (pp. 244–257).

  • Tsai, C. H., & Yen, J. C. (2014). Teaching spatial visualization skills using OpenNI and the microsoft kinect sensor. Berlin: Springer.

    Book  Google Scholar 

  • Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., & Gool, L. V. (2016). Temporal segment networks: Towards good practices for deep action recognition. In CoRR. arXiv:1608.00859.

  • Wu, Q., Wang, Z., Deng, F., Chi, Z., & Feng, D. (2013). Realistic human action recognition with multimodal feature selection and fusion. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 43(4), 875–885.

    Article  Google Scholar 

  • Yan, S., Xiong, Y., & Lin, D. (2018). Spatial temporal graph convolutional networks for skeleton-based action recognition. In CoRR. arXiv:1801.07455.

  • Yang, Y., Li, Y., Fermuller, C., & Aloimonos, Y. (2015). Robot learning manipulation action plans by “watching” unconstrained videos from the world wide web. In Proceedings of the twenty-ninth AAAI conference on artificial intelligence, ser. AAAI’15. AAAI Press, 2015 (pp. 3686–3692). http://dl.acm.org/citation.cfm?id=2888116.2888228.

  • Zhao, Z., Cox, J., Duling, D., & Sarle, W. (2012). Massively parallel feature selection: An approach based on variance preservation. In European conference on machine learning and knowledge discovery in databases (pp. 237–252).

  • Zhao, Z., Ma, H., & You, S. (2016). Single image action recognition using semantic body part actions. In CoRR. arXiv:1612.04520.

  • Zhou, L., Li, W., & Ogunbona, P. (2016). Learning a pose lexicon for semantic action recognition. In IEEE international conference on multimedia and expo (pp. 1–6).

Download references

Acknowledgements

This Project is supported by the National Natural Science Foundation of China (No. 61906123). The Fundamental Research Funds for Shenzhen Technology University. Shenzhen Overseas High Level Talent (Peacock Plan) Program (No. KQTD20140630154026047). National Natural Science Foundation of China U1713216. Shenzhen basic research Projects (JCYJ20160429161539298). National Natural Science Foundation of China (No. 61976070). Scientific Research Platforms and Projects in Universities in Guangdong Province under Grants 2019KTSCX204.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ye Gu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gu, Y., Liu, M., Sheng, W. et al. Sensor fusion based manipulative action recognition. Auton Robot 45, 1–13 (2021). https://doi.org/10.1007/s10514-020-09943-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10514-020-09943-8

Navigation