Exploiting Semantic and Boundary Information for Stereo Matching

Peng, Fang; Tan, Yu; Zhang, Cheng

doi:10.1007/s11265-021-01675-x

Exploiting Semantic and Boundary Information for Stereo Matching

Published: 30 June 2021

Volume 95, pages 379–391, (2023)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

Fang Peng¹,
Yu Tan² &
Cheng Zhang³

479 Accesses
2 Citations
Explore all metrics

Abstract

Stereo matching aims to estimating disparity by finding the correspondence of each pixel between two images which is crucial to 3D scene reconstruction. Nowadays 3D convolution neural networks achieve impressive performances on stereo matching. However, it is memory consuming and computation complex. And it is challenging to finding the corresponding pixels in textureless and near boundary regions. Therefore, a stereo matching neural network is proposed which use semantic segmentation and boundary detection task to improve the accuracy of stereo matching near boundary and textureless regions. And a hybrid cost volume which reflects the similarity between left and right feature map, is designed to contains semantic cost volume and boundary cost volume with attention mechanism. The stereo matching neural network is designed to rely on coarse-to-fine strategy which predict a complete disparity map at the highest resolution and refine disparity at the lower resolution. We conduct comprehensive experiments on KITTI 2015 datasets, and compare with some recent stereo matching neural networks, the D1-all (3-pixel error) is 2.8% and run time is 0.044s which shows that embedding semantic and boundary information can improve the accuracy of stereo matching.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SGNet: Semantics Guided Deep Stereo Matching

Improved Stereo Matching Accuracy Based on Selective Backpropagation and Extended Cost Volume

Article 29 April 2022

Jeong-Min Park & Joon-Woong Lee

EBStereo: edge-based loss function for real-time stereo matching

Article 14 July 2023

Weijie Bi, Ming Chen, … Shenglian Lu

References

Acuna, D., Kar, A., & Fidler, S. (2019). Devil is in the edges: Learning semantic boundaries from noisy annotations. In 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 11067–11075).
žbontar, J., & LeCun, Y. (2015). Computing the stereo matching cost with a convolutional neural network. In 2015 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1592–1599).
Chang, J., & Chen, Y. (2018). Pyramid stereo matching network. In 2018 IEEE/CVF conference on computer vision and pattern recognition (pp. 5410–5418).
Chen, L., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A.L. (2018). Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), 834–848.
Article Google Scholar
Cheng, J., Tsai, Y., Wang, S., & Yang, M. (2017). Segflow: Joint learning for video object segmentation and optical flow. In 2017 IEEE international conference on computer vision (ICCV). IEEE computer society (pp. 686–695).
Cheng, X., Wang, P., & Yang, R. (2020). Learning depth with convolutional spatial propagation network. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(10), 2361–2379.
Article Google Scholar
Cipolla, R., Gal, Y., & Kendall, A. (2018). Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In 2018 IEEE/CVF conference on computer vision and pattern recognition (pp. 7482–7491).
Dai, W., Qiu, L., Wu, A., & Qiu, M. (2016). Cloud infrastructure resource allocation for big data applications. IEEE Transactions on Big Data, 4(3), 313–324.
Article Google Scholar
Dai, W., Qiu, M., Qiu, L., Chen, L., & Wu, A. (2017). Who moved my data? privacy protection in smartphones. IEEE Communications Magazine, 55(1), 20–25.
Article Google Scholar
Ding, M., Wang, Z., Zhou, B., Shi, J., Lu, Z., & Luo, P. (2020). Every frame counts: Joint learning of video segmentation and optical flow. Proceedings of the AAAI Conference on Artificial Intelligence, 34(7), 10713–10720.
Article Google Scholar
Dovesi, P.L., Poggi, M., Andraghetti, L., Martí, M., Kjellström, H., Pieropan, A., & Mattoccia, S. (2020). Real-time semantic stereo matching. In 2020 IEEE international conference on robotics and automation (ICRA) (pp. 10780–10787).
Duggal, S., Wang, S., Ma, W., Hu, R., & Urtasun, R. (2019). Deeppruner: Learning efficient stereo matching via differentiable patchmatch. In 2019 IEEE/CVF international conference on computer vision (ICCV) (pp. 4383–4392).
Feng, Y., Liang, Z., & Liu, H. (2017). Efficient deep learning for stereo matching with larger image patches. In 2017 10th international congress on image and signal processing BioMedical engineering and informatics (CISP-BMEI) (pp. 1–5).
Gai, K., & Qiu, M. (2018). Optimal resource allocation using reinforcement learning for IoT content-centric services. Applied Soft Computing, 70.
Gai, K., & Qiu, M. (2018). Reinforcement learning-based content-centric services in mobile sensing. IEEE Network, 32(4), 34–39.
Article Google Scholar
Gai, K., & Qiu, M. (2018). Reinforcement learning-based content-centric services in mobile sensing. IEEE Network, 32(4), 34–39.
Article Google Scholar
Gai, K., Qiu, M., & Elnagdy, S.A. (2016). A novel secure big data cyber incident analytics framework for cloud-based cybersecurity insurance. In IEEE 2nd international conference on big data security on cloud (CSCloud).
Gai, K., Qiu, M., Sun, X., & Zhao, H. (2016). Security and privacy issues: A survey on FinTech. In International conference on smart computing and communication (pp. 236–247). Springer.
Gai, K., Wu, Y., Zhu, L., Zhang, Z., & Qiu, M. (2019). Differential privacy-based blockchain for industrial internet-of-things. IEEE Transactions on Industrial Informatics, 16(6), 4156–4165.
Article Google Scholar
Gao, Y., Iqbal, S., Zhang, P., & Qiu, M. (2015). Performance and power analysis of high-density multi-gpgpu architectures: A preliminary case study. In IEEE 17th Intl. Conf. on High Performance Computing and Communications (HPCC) (pp. 66–71).
Geiger, A., Lenz, P., & Urtasun, R. (2012). Are we ready for autonomous driving? the kitti vision benchmark suite. In 2012 IEEE conference on computer vision and pattern recognition (pp. 3354–3361).
Girshick, R. (2015). Fast r-cnn. In 2015 IEEE international conference on computer vision (ICCV) (pp. 1440–1448).
Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In 2014 IEEE conference on computer vision and pattern recognition (pp. 580–587).
Guo, X., Yang, K., Yang, W., Wang, X., & Li, H. (2019). Group-wise correlation stereo network. In 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 3268–3277).
Guo, Y., Zhuge, Q., Hu, J., Qiu, M., & Sha, E.H.-M. (2011). Optimal data allocation for scratch-pad memory on embedded multi-core systems. In IEEE international conference on parallel processing (ICPP) (pp. 464–471).
Guo, Y., Zhuge, Q., Hu, J., Yi, J., Qiu, M., & Sha, E.H.-M. (2013). Data placement and duplication for embedded multicore systems with scratch pad memory. IEEE Transactions on Computer-Aided Design of Integrated Circuits.
Hirschmuller, H. (2008). Stereo processing by semiglobal matching and mutual information. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(2), 328–341.
Article Google Scholar
Junhwa, H., & Stefan, R. (2016). Joint optical flow and temporally consistent semantic segmentation. In European conference on computer vision.
Kendall, A., Martirosyan, H., Dasgupta, S., Henry, P., Kennedy, R., Bachrach, A., & Bry, A. (2017). End-to-end learning of geometry and context for deep stereo regression. In 2017 IEEE international conference on computer vision (ICCV) (pp. 66–75).
Liang, Z., Feng, Y., Guo, Y., Liu, H., Chen, W., Qiao, L., Zhou, L., & Zhang, J. (2018). Learning for disparity estimation through feature constancy. In 2018 IEEE/CVF conference on computer vision and pattern recognition (pp. 2811–2820).
Lin, T., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature pyramid networks for object detection. In 2017 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 936–944).
Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In 2015 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3431–3440).
Kretz, A., Ochs, M., Mester, R., & et al. (2019). Sdnet: Semantically guided depth estimation network. In German conference on pattern recognition.
Mayer, N., Ilg, E., Häusser, P., Fischer, P., Cremers, D., Dosovitskiy, A., & Brox, T. (2016). A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In 2016 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 4040–4048).
Mousavian, A., Pirsiavash, H., & Košecká, J. (2016). Joint semantic segmentation and depth estimation with deep convolutional networks. In 2016 fourth international conference on 3D vision (3DV) (pp. 611–619).
Niu, J., Liu, C., Gao, Y., & Qiu, M. (2013). Energy efficient task assignment with guaranteed probability satisfying timing constraints for embedded systems. IEEE Transactions on Parallel and Distributed Systems, 25(8), 2043–2052.
Article Google Scholar
Pang, J., Sun, W., Ren, J.S., Yang, C., & Yan, Q. (2017). Cascade residual learning: a two-stage convolutional neural network for stereo matching. In 2017 IEEE international conference on computer vision workshops (ICCVW) (pp. 878–886).
Poggi, M., Aleotti, F., Tosi, F., & Mattoccia, S. (2018). Towards real-time unsupervised monocular depth estimation on cpu. In 2018 IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 5848–5854).
Qiu, H., Zheng, Q., Memmi, G., Lu, J., Qiu, M., & Thuraisingham, B. (2021). Deep residual learning-based enhanced jpeg compression in the internet of things. IEEE Transactions on Industrial Informatics, 17(3), 2124–2133.
Google Scholar
Qiu, H., Zheng, Q., Msahli, M., Memmi, G., Qiu, M., & Lu, J. (2020). Topological graph convolutional network-based urban traffic flow and density prediction. In IEEE transactions on intelligent transportation systems (pp. 1–10).
Qiu, H., Zheng, Q., Zhang, T., Qiu, M., Memmi, G., & Lu, J. (2021). Toward secure and efficient deep learning inference in dependable iot systems. IEEE Internet of Things Journal, 8(5), 3180–3188.
Article Google Scholar
Qiu, M., Cao, D., Su, H., & Gai, K. (2016). Data transfer minimization for financial derivative pricing using monte carlo simulation with GPU in 5G. IEEE International Journal of Communication Systems, 29(16), 2364–2374.
Article Google Scholar
Qiu, M., Chen, Z., & Liu, M. (2014). Low-power low-latency data allocation for hybrid scratch-pad memory. IEEE Embedded Systems Letters, 6(4), 69–72.
Article Google Scholar
Qiu, M., Zhang, K., & Huang, M. (2004). An empirical study of web interface design on small display devices. In IEEE/WIC/ACM international conference on web intelligence (WI’04) (pp. 29–35).
Qiu, M., Ming, Z., Wang, J., Yang, L.T., & Xiang, Y. (2014). Enabling cloud computing in emergency management systems. IEEE Cloud Computing, 1(4), 60–67.
Article Google Scholar
Saikia, T., Marrakchi, Y., Zela, A., Hutter, F., & Brox, T. (2019). Autodispnet: Improving disparity estimation with automl. In 2019 IEEE/CVF international conference on computer vision (ICCV) (pp. 1812–1823).
Scharstein, D., Szeliski, R., & Zabih, R. (2001). A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. In Proceedings IEEE workshop on stereo and multi-baseline vision (SMBV 2001) (pp. 131–140).
Schmid, K., Tomic, T., Ruess, F., Hirschmüller, H., & Suppa, M. (2013). Stereo vision based indoor/outdoor navigation for flying robots. In 2013 IEEE/RSJ international conference on intelligent robots and systems (pp. 3955–3962).
Seki, A., & Pollefeys, M. (2017). Sgm-nets: Semi-global matching with neural networks. In 2017 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 6640– 6649).
Tao, L., Golikov, S., Gai, K., & Qiu, M. (2015). A reusable software component for integrated syntax and semantic validation for services computing. In IEEE symposium on service-oriented system engineering (SOSE) (pp. 127–132).
Thakur, K., Qiu, M., Gai, K., & Liakat Ali, M. (2015). An investigation on cyber security threats and security models. In 2015 IEEE 2nd international conference on cyber security and cloud computing (pp. 307–311).
Tonioni, A., Tosi, F., Poggi, M., Mattoccia, S., & Stefano, L.D. (2019). Real-time self-adaptive deep stereo. In 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 195–204).
Tosi, F., Aleotti, F., Ramirez, P.Z., Poggi, M., Salti, S., Di Stefano, L., & Mattoccia, S. (2020). Distilled semantics for comprehensive scene understanding from videos. In 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 4653–4664).
Wang, P., Xiaohui, S., Zhe, L., Cohen, S., Price, B., & Yuille, A. (2015). Towards unified depth and semantic prediction from a single image. In 2015 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2800–2809).
Wang, Q., Shi, S., Zheng, S., Zhao, K., & Chu, X. (2020). Fadnet: a fast and accurate network for disparity estimation. In 2020 IEEE international conference on robotics and automation (ICRA) (pp. 101–107).
Wang, Y., Lai, Z., Huang, G., Wang, B.H., van der Maaten, L., Campbell, M., & Weinberger, K.Q. (2019). Anytime stereo image depth estimation on mobile devices. In 2019 international conference on robotics and automation (ICRA) (pp. 5893– 5900).
Wu, Z., Wu, X., Zhang, X., Wang, S., & Ju, L. (2019). Semantic stereo matching with pyramid cost volumes. In 2019 IEEE/CVF international conference on computer vision (ICCV) (pp. 7483–7492).
Xiao, S., Xu, Z., Hanwen, H., & Liangji, F. (2018). Edgestereo: A context integrated residual pyramid network for stereo matching. In Asian Conference on Computer Vision.
Xu, D., Ouyang, W., Wang, X., & Sebe, N. (2018). Pad-net: Multi-tasks guided prediction-and-distillation network for simultaneous depth estimation and scene parsing. In 2018 IEEE/CVF conference on computer vision and pattern recognition (pp. 675–684).
Yang, G., Zhao, H., Shi, J., Deng, Z., & Jia, J. (2018). Segstereo: Exploiting semantic information for disparity estimation. In European conference on computer vision.
Yang, G., Manela, J., Happold, M., & Ramanan, D. (2019). Hierarchical deep stereo matching on high-resolution images. In 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 5510–5519).
Yu, Z., Feng, C., Liu, M., & Ramalingam, S. (2017). Casenet: Deep category-aware semantic edge detection. In 2017 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1761–1770).
Zhang, F., Prisacariu, V., Yang, R., & Torr, P.H.S. (2019). Ga-net: Guided aggregation net for end-to-end stereo matching. In 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 185–194).
Zhang, F., Prisacariu, V., Yang, R., & Torr, P.H.S. (2019). Ga-net: Guided aggregation net for end-to-end stereo matching. In 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 185–194).
Zhang, J., Skinner, K.A., Vasudevan, R., & Johnson-Roberson, M. (2019). Dispsegnet: Leveraging semantics for end-to-end learning of disparity estimation from stereo imagery. IEEE Robotics and Automation Letters, 4(2), 1162–1169.
Article Google Scholar
Zhang, L., Qiu, M., Tseng, W.C., & Sha, E.H.-M. (2010). Variable partitioning and scheduling for MPSoc with virtually shared scratch pad memory. Journal of Signal Processing Systems, 58(2), 247–265.
Article Google Scholar
Zhang, Q., Huang, T., Zhu, Y., & Qiu, M. (2013). A case study of sensor data collection and analysis in smart city: provenance in smart food supply chain. International Journal of Distributed Sensor Networks, 9(11), 382132.
Article Google Scholar
Zhao, H., Chen, M., Qiu, M., Gai, K., & Liu, M. (2016). A novel pre-cache schema for high performance android system. Future Generation Computer Systems, 56.
Zhu, M., Liu, X., Tang, F., Qiu, M., Shen, R., Shu, W., & Wu, M. (2016). Public vehicles for future urban transportation. IEEE Transactions on Intelligent Transportation Systems, 17(12), 3344–3353.
Article Google Scholar

Download references

Acknowledgements

This work was supported by Provincial Key platforms and major scientific research projects of Guangdong Universities under Grant 2017KTSCX208 and Science and Technology Planning Project of Zhongshan under Grant 2019B2066.

Author information

Authors and Affiliations

School of Mechanical and Electrical Engineering, University of Electronic Science and Technology of China, Zhongshan Institute, Zhongshan, China
Fang Peng
School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu, China
Yu Tan
Department of Mechanical System Engineering, Ibaraki University, Ibaraki, Japan
Cheng Zhang

Authors

Fang Peng
View author publications
You can also search for this author in PubMed Google Scholar
Yu Tan
View author publications
You can also search for this author in PubMed Google Scholar
Cheng Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cheng Zhang.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Peng, F., Tan, Y. & Zhang, C. Exploiting Semantic and Boundary Information for Stereo Matching. J Sign Process Syst 95, 379–391 (2023). https://doi.org/10.1007/s11265-021-01675-x

Download citation

Received: 16 April 2021
Revised: 20 May 2021
Accepted: 03 June 2021
Published: 30 June 2021
Issue Date: March 2023
DOI: https://doi.org/10.1007/s11265-021-01675-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exploiting Semantic and Boundary Information for Stereo Matching

Abstract

Access this article

Similar content being viewed by others

SGNet: Semantics Guided Deep Stereo Matching

Improved Stereo Matching Accuracy Based on Selective Backpropagation and Extended Cost Volume

EBStereo: edge-based loss function for real-time stereo matching

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Exploiting Semantic and Boundary Information for Stereo Matching

Abstract

Access this article

Similar content being viewed by others

SGNet: Semantics Guided Deep Stereo Matching

Improved Stereo Matching Accuracy Based on Selective Backpropagation and Extended Cost Volume

EBStereo: edge-based loss function for real-time stereo matching

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation