Skip to main content
Log in

Exploiting Semantic and Boundary Information for Stereo Matching

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

Stereo matching aims to estimating disparity by finding the correspondence of each pixel between two images which is crucial to 3D scene reconstruction. Nowadays 3D convolution neural networks achieve impressive performances on stereo matching. However, it is memory consuming and computation complex. And it is challenging to finding the corresponding pixels in textureless and near boundary regions. Therefore, a stereo matching neural network is proposed which use semantic segmentation and boundary detection task to improve the accuracy of stereo matching near boundary and textureless regions. And a hybrid cost volume which reflects the similarity between left and right feature map, is designed to contains semantic cost volume and boundary cost volume with attention mechanism. The stereo matching neural network is designed to rely on coarse-to-fine strategy which predict a complete disparity map at the highest resolution and refine disparity at the lower resolution. We conduct comprehensive experiments on KITTI 2015 datasets, and compare with some recent stereo matching neural networks, the D1-all (3-pixel error) is 2.8% and run time is 0.044s which shows that embedding semantic and boundary information can improve the accuracy of stereo matching.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10

Similar content being viewed by others

References

  1. Acuna, D., Kar, A., & Fidler, S. (2019). Devil is in the edges: Learning semantic boundaries from noisy annotations. In 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 11067–11075).

  2. žbontar, J., & LeCun, Y. (2015). Computing the stereo matching cost with a convolutional neural network. In 2015 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1592–1599).

  3. Chang, J., & Chen, Y. (2018). Pyramid stereo matching network. In 2018 IEEE/CVF conference on computer vision and pattern recognition (pp. 5410–5418).

  4. Chen, L., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A.L. (2018). Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 834–848.

    Article  Google Scholar 

  5. Cheng, J., Tsai, Y., Wang, S., & Yang, M. (2017). Segflow: Joint learning for video object segmentation and optical flow. In 2017 IEEE international conference on computer vision (ICCV). IEEE computer society (pp. 686–695).

  6. Cheng, X., Wang, P., & Yang, R. (2020). Learning depth with convolutional spatial propagation network. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(10), 2361–2379.

    Article  Google Scholar 

  7. Cipolla, R., Gal, Y., & Kendall, A. (2018). Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In 2018 IEEE/CVF conference on computer vision and pattern recognition (pp. 7482–7491).

  8. Dai, W., Qiu, L., Wu, A., & Qiu, M. (2016). Cloud infrastructure resource allocation for big data applications. IEEE Transactions on Big Data, 4(3), 313–324.

    Article  Google Scholar 

  9. Dai, W., Qiu, M., Qiu, L., Chen, L., & Wu, A. (2017). Who moved my data? privacy protection in smartphones. IEEE Communications Magazine, 55(1), 20–25.

    Article  Google Scholar 

  10. Ding, M., Wang, Z., Zhou, B., Shi, J., Lu, Z., & Luo, P. (2020). Every frame counts: Joint learning of video segmentation and optical flow. Proceedings of the AAAI Conference on Artificial Intelligence, 34(7), 10713–10720.

    Article  Google Scholar 

  11. Dovesi, P.L., Poggi, M., Andraghetti, L., Martí, M., Kjellström, H., Pieropan, A., & Mattoccia, S. (2020). Real-time semantic stereo matching. In 2020 IEEE international conference on robotics and automation (ICRA) (pp. 10780–10787).

  12. Duggal, S., Wang, S., Ma, W., Hu, R., & Urtasun, R. (2019). Deeppruner: Learning efficient stereo matching via differentiable patchmatch. In 2019 IEEE/CVF international conference on computer vision (ICCV) (pp. 4383–4392).

  13. Feng, Y., Liang, Z., & Liu, H. (2017). Efficient deep learning for stereo matching with larger image patches. In 2017 10th international congress on image and signal processing BioMedical engineering and informatics (CISP-BMEI) (pp. 1–5).

  14. Gai, K., & Qiu, M. (2018). Optimal resource allocation using reinforcement learning for IoT content-centric services. Applied Soft Computing, 70.

  15. Gai, K., & Qiu, M. (2018). Reinforcement learning-based content-centric services in mobile sensing. IEEE Network, 32(4), 34–39.

    Article  Google Scholar 

  16. Gai, K., & Qiu, M. (2018). Reinforcement learning-based content-centric services in mobile sensing. IEEE Network, 32(4), 34–39.

    Article  Google Scholar 

  17. Gai, K., Qiu, M., & Elnagdy, S.A. (2016). A novel secure big data cyber incident analytics framework for cloud-based cybersecurity insurance. In IEEE 2nd international conference on big data security on cloud (CSCloud).

  18. Gai, K., Qiu, M., Sun, X., & Zhao, H. (2016). Security and privacy issues: A survey on FinTech. In International conference on smart computing and communication (pp. 236–247). Springer.

  19. Gai, K., Wu, Y., Zhu, L., Zhang, Z., & Qiu, M. (2019). Differential privacy-based blockchain for industrial internet-of-things. IEEE Transactions on Industrial Informatics, 16(6), 4156–4165.

    Article  Google Scholar 

  20. Gao, Y., Iqbal, S., Zhang, P., & Qiu, M. (2015). Performance and power analysis of high-density multi-gpgpu architectures: A preliminary case study. In IEEE 17th Intl. Conf. on High Performance Computing and Communications (HPCC) (pp. 66–71).

  21. Geiger, A., Lenz, P., & Urtasun, R. (2012). Are we ready for autonomous driving? the kitti vision benchmark suite. In 2012 IEEE conference on computer vision and pattern recognition (pp. 3354–3361).

  22. Girshick, R. (2015). Fast r-cnn. In 2015 IEEE international conference on computer vision (ICCV) (pp. 1440–1448).

  23. Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In 2014 IEEE conference on computer vision and pattern recognition (pp. 580–587).

  24. Guo, X., Yang, K., Yang, W., Wang, X., & Li, H. (2019). Group-wise correlation stereo network. In 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 3268–3277).

  25. Guo, Y., Zhuge, Q., Hu, J., Qiu, M., & Sha, E.H.-M. (2011). Optimal data allocation for scratch-pad memory on embedded multi-core systems. In IEEE international conference on parallel processing (ICPP) (pp. 464–471).

  26. Guo, Y., Zhuge, Q., Hu, J., Yi, J., Qiu, M., & Sha, E.H.-M. (2013). Data placement and duplication for embedded multicore systems with scratch pad memory. IEEE Transactions on Computer-Aided Design of Integrated Circuits.

  27. Hirschmuller, H. (2008). Stereo processing by semiglobal matching and mutual information. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(2), 328–341.

    Article  Google Scholar 

  28. Junhwa, H., & Stefan, R. (2016). Joint optical flow and temporally consistent semantic segmentation. In European conference on computer vision.

  29. Kendall, A., Martirosyan, H., Dasgupta, S., Henry, P., Kennedy, R., Bachrach, A., & Bry, A. (2017). End-to-end learning of geometry and context for deep stereo regression. In 2017 IEEE international conference on computer vision (ICCV) (pp. 66–75).

  30. Liang, Z., Feng, Y., Guo, Y., Liu, H., Chen, W., Qiao, L., Zhou, L., & Zhang, J. (2018). Learning for disparity estimation through feature constancy. In 2018 IEEE/CVF conference on computer vision and pattern recognition (pp. 2811–2820).

  31. Lin, T., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature pyramid networks for object detection. In 2017 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 936–944).

  32. Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In 2015 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3431–3440).

  33. Kretz, A., Ochs, M., Mester, R., & et al. (2019). Sdnet: Semantically guided depth estimation network. In German conference on pattern recognition.

  34. Mayer, N., Ilg, E., Häusser, P., Fischer, P., Cremers, D., Dosovitskiy, A., & Brox, T. (2016). A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In 2016 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 4040–4048).

  35. Mousavian, A., Pirsiavash, H., & Košecká, J. (2016). Joint semantic segmentation and depth estimation with deep convolutional networks. In 2016 fourth international conference on 3D vision (3DV) (pp. 611–619).

  36. Niu, J., Liu, C., Gao, Y., & Qiu, M. (2013). Energy efficient task assignment with guaranteed probability satisfying timing constraints for embedded systems. IEEE Transactions on Parallel and Distributed Systems, 25(8), 2043–2052.

    Article  Google Scholar 

  37. Pang, J., Sun, W., Ren, J.S., Yang, C., & Yan, Q. (2017). Cascade residual learning: a two-stage convolutional neural network for stereo matching. In 2017 IEEE international conference on computer vision workshops (ICCVW) (pp. 878–886).

  38. Poggi, M., Aleotti, F., Tosi, F., & Mattoccia, S. (2018). Towards real-time unsupervised monocular depth estimation on cpu. In 2018 IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 5848–5854).

  39. Qiu, H., Zheng, Q., Memmi, G., Lu, J., Qiu, M., & Thuraisingham, B. (2021). Deep residual learning-based enhanced jpeg compression in the internet of things. IEEE Transactions on Industrial Informatics, 17(3), 2124–2133.

    Google Scholar 

  40. Qiu, H., Zheng, Q., Msahli, M., Memmi, G., Qiu, M., & Lu, J. (2020). Topological graph convolutional network-based urban traffic flow and density prediction. In IEEE transactions on intelligent transportation systems (pp. 1–10).

  41. Qiu, H., Zheng, Q., Zhang, T., Qiu, M., Memmi, G., & Lu, J. (2021). Toward secure and efficient deep learning inference in dependable iot systems. IEEE Internet of Things Journal, 8(5), 3180–3188.

    Article  Google Scholar 

  42. Qiu, M., Cao, D., Su, H., & Gai, K. (2016). Data transfer minimization for financial derivative pricing using monte carlo simulation with GPU in 5G. IEEE International Journal of Communication Systems, 29(16), 2364–2374.

    Article  Google Scholar 

  43. Qiu, M., Chen, Z., & Liu, M. (2014). Low-power low-latency data allocation for hybrid scratch-pad memory. IEEE Embedded Systems Letters, 6(4), 69–72.

    Article  Google Scholar 

  44. Qiu, M., Zhang, K., & Huang, M. (2004). An empirical study of web interface design on small display devices. In IEEE/WIC/ACM international conference on web intelligence (WI’04) (pp. 29–35).

  45. Qiu, M., Ming, Z., Wang, J., Yang, L.T., & Xiang, Y. (2014). Enabling cloud computing in emergency management systems. IEEE Cloud Computing, 1(4), 60–67.

    Article  Google Scholar 

  46. Saikia, T., Marrakchi, Y., Zela, A., Hutter, F., & Brox, T. (2019). Autodispnet: Improving disparity estimation with automl. In 2019 IEEE/CVF international conference on computer vision (ICCV) (pp. 1812–1823).

  47. Scharstein, D., Szeliski, R., & Zabih, R. (2001). A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. In Proceedings IEEE workshop on stereo and multi-baseline vision (SMBV 2001) (pp. 131–140).

  48. Schmid, K., Tomic, T., Ruess, F., Hirschmüller, H., & Suppa, M. (2013). Stereo vision based indoor/outdoor navigation for flying robots. In 2013 IEEE/RSJ international conference on intelligent robots and systems (pp. 3955–3962).

  49. Seki, A., & Pollefeys, M. (2017). Sgm-nets: Semi-global matching with neural networks. In 2017 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 6640– 6649).

  50. Tao, L., Golikov, S., Gai, K., & Qiu, M. (2015). A reusable software component for integrated syntax and semantic validation for services computing. In IEEE symposium on service-oriented system engineering (SOSE) (pp. 127–132).

  51. Thakur, K., Qiu, M., Gai, K., & Liakat Ali, M. (2015). An investigation on cyber security threats and security models. In 2015 IEEE 2nd international conference on cyber security and cloud computing (pp. 307–311).

  52. Tonioni, A., Tosi, F., Poggi, M., Mattoccia, S., & Stefano, L.D. (2019). Real-time self-adaptive deep stereo. In 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 195–204).

  53. Tosi, F., Aleotti, F., Ramirez, P.Z., Poggi, M., Salti, S., Di Stefano, L., & Mattoccia, S. (2020). Distilled semantics for comprehensive scene understanding from videos. In 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 4653–4664).

  54. Wang, P., Xiaohui, S., Zhe, L., Cohen, S., Price, B., & Yuille, A. (2015). Towards unified depth and semantic prediction from a single image. In 2015 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2800–2809).

  55. Wang, Q., Shi, S., Zheng, S., Zhao, K., & Chu, X. (2020). Fadnet: a fast and accurate network for disparity estimation. In 2020 IEEE international conference on robotics and automation (ICRA) (pp. 101–107).

  56. Wang, Y., Lai, Z., Huang, G., Wang, B.H., van der Maaten, L., Campbell, M., & Weinberger, K.Q. (2019). Anytime stereo image depth estimation on mobile devices. In 2019 international conference on robotics and automation (ICRA) (pp. 5893– 5900).

  57. Wu, Z., Wu, X., Zhang, X., Wang, S., & Ju, L. (2019). Semantic stereo matching with pyramid cost volumes. In 2019 IEEE/CVF international conference on computer vision (ICCV) (pp. 7483–7492).

  58. Xiao, S., Xu, Z., Hanwen, H., & Liangji, F. (2018). Edgestereo: A context integrated residual pyramid network for stereo matching. In Asian Conference on Computer Vision.

  59. Xu, D., Ouyang, W., Wang, X., & Sebe, N. (2018). Pad-net: Multi-tasks guided prediction-and-distillation network for simultaneous depth estimation and scene parsing. In 2018 IEEE/CVF conference on computer vision and pattern recognition (pp. 675–684).

  60. Yang, G., Zhao, H., Shi, J., Deng, Z., & Jia, J. (2018). Segstereo: Exploiting semantic information for disparity estimation. In European conference on computer vision.

  61. Yang, G., Manela, J., Happold, M., & Ramanan, D. (2019). Hierarchical deep stereo matching on high-resolution images. In 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 5510–5519).

  62. Yu, Z., Feng, C., Liu, M., & Ramalingam, S. (2017). Casenet: Deep category-aware semantic edge detection. In 2017 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1761–1770).

  63. Zhang, F., Prisacariu, V., Yang, R., & Torr, P.H.S. (2019). Ga-net: Guided aggregation net for end-to-end stereo matching. In 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 185–194).

  64. Zhang, F., Prisacariu, V., Yang, R., & Torr, P.H.S. (2019). Ga-net: Guided aggregation net for end-to-end stereo matching. In 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 185–194).

  65. Zhang, J., Skinner, K.A., Vasudevan, R., & Johnson-Roberson, M. (2019). Dispsegnet: Leveraging semantics for end-to-end learning of disparity estimation from stereo imagery. IEEE Robotics and Automation Letters, 4(2), 1162–1169.

    Article  Google Scholar 

  66. Zhang, L., Qiu, M., Tseng, W.C., & Sha, E.H.-M. (2010). Variable partitioning and scheduling for MPSoc with virtually shared scratch pad memory. Journal of Signal Processing Systems, 58(2), 247–265.

    Article  Google Scholar 

  67. Zhang, Q., Huang, T., Zhu, Y., & Qiu, M. (2013). A case study of sensor data collection and analysis in smart city: provenance in smart food supply chain. International Journal of Distributed Sensor Networks, 9(11), 382132.

    Article  Google Scholar 

  68. Zhao, H., Chen, M., Qiu, M., Gai, K., & Liu, M. (2016). A novel pre-cache schema for high performance android system. Future Generation Computer Systems, 56.

  69. Zhu, M., Liu, X., Tang, F., Qiu, M., Shen, R., Shu, W., & Wu, M. (2016). Public vehicles for future urban transportation. IEEE Transactions on Intelligent Transportation Systems, 17(12), 3344–3353.

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by Provincial Key platforms and major scientific research projects of Guangdong Universities under Grant 2017KTSCX208 and Science and Technology Planning Project of Zhongshan under Grant 2019B2066.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cheng Zhang.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Peng, F., Tan, Y. & Zhang, C. Exploiting Semantic and Boundary Information for Stereo Matching. J Sign Process Syst 95, 379–391 (2023). https://doi.org/10.1007/s11265-021-01675-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-021-01675-x

Keywords

Navigation