Out of sight, out of mind? How vulnerable dependencies affect open-source projects

Prana, Gede Artha Azriadi; Sharma, Abhishek; Shar, Lwin Khin; Foo, Darius; Santosa, Andrew E.; Sharma, Asankhaya; Lo, David

doi:10.1007/s10664-021-09959-3

Out of sight, out of mind? How vulnerable dependencies affect open-source projects

Published: 21 April 2021

Volume 26, article number 59, (2021)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Gede Artha Azriadi Prana ORCID: orcid.org/0000-0003-3759-5661¹,
Abhishek Sharma²,
Lwin Khin Shar¹,
Darius Foo³,
Andrew E. Santosa²,
Asankhaya Sharma² &
…
David Lo¹

1849 Accesses
24 Citations
3 Altmetric
Explore all metrics

Abstract

Context

Software developers often use open-source libraries in their project to improve development speed. However, such libraries may contain security vulnerabilities, and this has resulted in several high-profile incidents in recent years. As usage of open-source libraries grows, understanding of these dependency vulnerabilities becomes increasingly important.

Objective

In this work, we analyze vulnerabilities in open-source libraries used by 450 software projects written in Java, Python, and Ruby. Our goal is to examine types, distribution, severity, and persistence of the vulnerabilities, along with relationships between their prevalence and project as well as commit attributes.

Method

Our data is obtained by scanning versions of the sample projects after each commit made between November 1, 2017 and October 31, 2018 using an industrial software composition analysis tool, which provides information such as library names and versions, dependency types (direct or transitive), and known vulnerabilities.

Results

Among other findings, we found that project activity level, popularity, and developer experience do not translate into better or worse handling of dependency vulnerabilities. We also found “Denial of Service” and “Information Disclosure” types of vulnerabilities being common across the languages studied. Further, we found that most dependency vulnerabilities persist throughout the observation period (mean of 78.4%, 97.7%, and 66.4% for publicly-known vulnerabilities in our Java, Python, and Ruby datasets respectively), and the resolved ones take 3-5 months to fix.

Conclusion

Our results highlight the importance of managing the number of dependencies and performing timely updates, and indicate some areas that can be prioritized to improve security in wide range of projects, such as prevention and mitigation of Denial-of-Service attacks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Future of software development with generative AI

Article Open access 11 March 2024

Jaakko Sauvola, Sasu Tarkoma, … David Doermann

How different are different diff algorithms in Git?

Article Open access 11 September 2019

Yusuf Sulistyo Nugroho, Hideaki Hata & Kenichi Matsumoto

Empirical Research in Software Engineering — A Literature Survey

Article 12 September 2018

Li Zhang, Jia-Hao Tian, … Tao Yue

Notes

References

Abdalkareem R, Nourry O, Wehaibi S, Mujahid S, Shihab E (2017) Why do developers use trivial packages? an empirical case study on npm. In: Proceedings of the 2017 11th joint meeting on foundations of software engineering. ACM, pp 385–395
Arora A, Telang R (2005) Economics of software vulnerability disclosure. IEEE Security & Privacy 3(1):20–25
Article Google Scholar
Bell RM, Ostrand TJ, Weyuker EJ (2013) The limited impact of individual developer data on software defect prediction, vol 18, pp 478–505, DOI https://doi.org/10.1007/s10664-011-9178-4
Bosu A, Carver JC, Hafiz M, Hilley P, Janni D (2014) Identifying the characteristics of vulnerable code changes: An empirical study. In: Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering, pp 257–268
Cadariu M, Bouwers E, Visser J, van Deursen A (2015) Tracking known security vulnerabilities in proprietary software systems. In: 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER). IEEE, pp 516–519
Camilo F, Meneely A, Nagappan M (2015) Do bugs foreshadow vulnerabilities?: a study of the chromium project. In: Proceedings of the 12th Working Conference on Mining Software Repositories. IEEE Press, pp 269–279
Dashevskyi S, Brucker AD, Massacci F (2016) On the security cost of using a free and open source component in a proprietary product. In: International Symposium on Engineering Secure Software and Systems. Springer, pp 190–206
Decan A, Mens T, Constantinou E (2018) On the impact of security vulnerabilities in the npm package dependency network. In: 2018 IEEE/ACM 15th International conference on mining software repositories (MSR). IEEE, pp 181–191
Decan A, Mens T, Grosjean P (2019) An empirical comparison of dependency network evolution in seven software packaging ecosystems. Empir. Softw. Eng. 24(1):381–416
Article Google Scholar
Derr E, Bugiel S, Fahl S, Acar Y, Backes M (2017) Keep me updated: An empirical study of third-party library updatability on android. In: Proceedings of the 2017ACM SIGSAC conference on computer and communications security. ACM, pp 2187–2200
Devanbu P, Zimmermann T, Bird C (2016) Belief & evidence in empirical software engineering. In: 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE). IEEE, pp 108–119
Durumeric Z, Li F, Kasten J, Amann J, Beekman J, Payer M, Weaver N, Adrian D, Paxson V, Bailey M, et al. (2014) The matter of heartbleed. In: Proceedings of the 2014 conference on internet measurement conference. ACM, pp 475–488
Fazzini M, Xin Q, Orso A (2019) Automated api-usage update for android apps. In: Proceedings of the 28th ACM SIGSOFT international symposium on software testing and analysis, pp 204–215
Foo D, Ang M Y, Yeo J, Sharma A (2018a) Sgl: A domain-specific language for large-scale analysis of open-source code. In: 2018 IEEE Cybersecurity Development (SecDev). IEEE, pp 61–68
Foo D, Chua H, Yeo J, Ang MY, Sharma A (2018b) Efficient static checking of library updates. In: Proceedings of the 2018 26th ACM Joint meeting on European software engineering conference and symposium on the foundations of software engineering. ACM, pp 791–796
Gardner W, Mulvey E P, Shaw E C (1995) Regression analyses of counts and rates: Poisson, overdispersed poisson, and negative binomial models. Psychol Bull 118(3):392
Article Google Scholar
Hilbe JM (2011) Negative binomial regression. Cambridge University Press, Cambridge
Hoepman J H, Jacobs B (2007) Increased security through open source. Commun. ACM 50(1):79–83
Article Google Scholar
Jezek K, Dietrich J (2017) Api evolution and compatibility: A data corpus and tool evaluation. Journal of Object Technology 16(4):2–1
Article Google Scholar
Jimenez M, Papadakis M, Le Traon Y (2016) Vulnerability prediction models: a case study on the Linux Kernel. In: 2016 IEEE 16th International working conference on source code analysis and manipulation (SCAM). IEEE, pp 1–10
Kalliamvakou E, Gousios G, Blincoe K, Singer L, German DM, Damian D (2014) The promises and perils of mining github. In: Proceedings of the 11th working conference on mining software repositories. ACM, pp 92–101
Kaplan E L, Meier P (1958) Nonparametric estimation from incomplete observations. Journal of the American Statistical Association 53 (282):457–481
Article MathSciNet Google Scholar
Kononenko O, Baysal O, Godfrey MW (2016) Code review quality: how developers see it. In: Proceedings of the 38th international conference on software engineering, pp 1028–1038
Kula RG, German DM, Ishio T, Inoue K (2015) Trusting a library: A study of the latency to adopt the latest maven release. In: 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER). https://doi.org/10.1109/SANER.2015.7081869, http://ieeexplore.ieee.org/document/7081869/. IEEE, pp 520–524
Kula R G, German D M, Ouni A, Ishio T, Inoue K (2018) Do developers update their library dependencies? Empir. Softw. Eng. 23(1):384–417
Article Google Scholar
Lamothe M, Shang W, Chen TH (2018) A4: Automatically assisting android api migrations using code examples. arXiv:181204894
Lauinger T, Chaabane A, Wilson CB (2018) Thou shalt not depend on me. Commun ACM 61(6):41–47. https://doi.org/10.1145/3190562, http://doi.acm.org.libproxy.smu.edu.sg/10.1145/3190562
Article Google Scholar
Li J, Conradi R, Bunse C, Torchiano M, Slyngstad O P N, Morisio M (2009) Development with off-the-shelf components: 10 facts. IEEE Softw 26(2):80–87
Article Google Scholar
McCabe T J (1976) A complexity measure. IEEE Transactions on software Engineering (4):308–320
Meneely A, Williams L (2009) Secure open source collaboration: an empirical study of Linus’ Law. In: Proceedings of the 16th ACM conference on Computer and communications security, pp 453–462
Meneely A, Williams L (2010) Strengthening the empirical analysis of the relationship between linus’ law and software security. In: Proceedings of the 2010 ACM-IEEE international symposium on empirical software engineering and measurement, pp 1–10
Meneely A, Srinivasan H, Musa A, Tejeda AR, Mokary M, Spates B (2013) When a patch goes bad: Exploring the properties of vulnerability-contributing commits. In: 2013 ACM/IEEE international symposium on empirical software engineering and measurement. IEEE, pp 65–74
Meng N, Nagy S, Yao D, Zhuang W, Arango-Argoty G (2018) Secure coding practices in java: Challenges and vulnerabilities, IEEE
Mezzetti G, Møller A, Torp MT (2018) Type regression testing to detect breaking changes in node. js libraries. In: 32nd European Conference on Object-Oriented Programming (ECOOP 2018), Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik
Møller A, Torp MT (2019) Model-based testing of breaking changes in node. js libraries. In: Proceedings of the 2019 27th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, pp 409–419
Munaiah N, Kroh S, Cabrey C, Nagappan M (2017) Curating github for engineered software projects. Empir. Softw. Eng. 22(6):3219–3253
Article Google Scholar
Nagappan N, Ball T (2005) Use of relative code churn measures to predict system defect density. In: Proceedings of the 27th international conference on Software engineering. ACM, pp 284–292
Neuhaus S, Zimmermann T, Holler C, Zeller A (2007) Predicting vulnerable software components. In: ACM Conference on computer and communications security. Citeseer, pp 529–540
Ostrand TJ, Weyuker EJ, Bell RM (2010) Programmer-based fault prediction. In: Proceedings of the 6th international conference on predictive models in software engineering - PROMISE ’10. https://doi.org/10.1145/1868328.1868357. http://portal.acm.org/citation.cfm?doid=1868328.1868357. ACM Press, p 1
Ozment A, Schechter SE (2006) Milk or wine: does software security improve with age? In: USENIX Security Symposium, pp 93–104
Pashchenko I, Plate H, Ponta SE, Sabetta A, Massacci F (2018) Vulnerable open source dependencies: counting those that matter. In: Proceedings of the 12th ACM/IEEE international symposium on empirical software engineering and measurement. ACM, p 42
Perl H, Dechand S, Smith M, Arp D, Yamaguchi F, Rieck K, Fahl S, Acar Y (2015) Vccfinder: Finding potential vulnerabilities in open-source projects to assist code audits. In: Proceedings of the 22nd ACM SIGSAC conference on computer and communications security, pp 426–437
Rahman A, Farhana E, Imtiaz N (2019) Snakes in paradise?: insecure python-related coding practices in stack overflow. In: Proceedings of the 16th international conference on mining software repositories. IEEE Press, pp 200–204
Ray B, Posnett D, Filkov V, Devanbu P (2014) A large scale study of programming languages and code quality in github. In: Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering, pp 155–165
Raymond E (1999) The cathedral and the bazaar. Knowledge, Technology & Policy 12(3):23–49
Article Google Scholar
Seabold S, Perktold J (2010) statsmodels: Econometric and statistical modeling with python. In: 9th Python in Science Conference
Shahzad M, Shafiq M Z, Liu AX (2012) A large scale exploratory analysis of software vulnerability life cycles. In: 2012 34th International Conference on Software Engineering (ICSE). IEEE, pp 771– 781
Shin Y, Williams L (2011) An initial study on the use of execution complexity metrics as indicators of software vulnerabilities. In: Proceedings of the 7th international workshop on software engineering for secure systems, pp 1–7
Shin Y, Williams L (2013) Can traditional fault prediction models be used for vulnerability prediction? Empir. Softw. Eng. 18(1):25–59
Article Google Scholar
Shin Y, Meneely A, Williams L, Osborne J A (2010) Evaluating complexity, code churn, and developer activity metrics as indicators of software vulnerabilities. IEEE Trans Softw Eng 37(6):772–787
Article Google Scholar
Spadini D, Aniche M, Bacchelli A (2018) Pydriller: Python framework for mining software repositories. In: Proceedings of the 2018 26th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering. ACM, pp 908–911
Spearman C (1904) The proof and measurement of association between two things. Am J Psychol 15(1):72–101
Article Google Scholar
Thung F, Haryono S A, Serrano L, Muller G, Lawall J, Lo D, Jiang L (2020) Automated deprecated-api usage update for android apps: How far are we?. In: 2020 IEEE 27th international conference on software analysis, evolution and reengineering (SANER). IEEE, pp 602–611
Trockman A (2018) Adding sparkle to social coding: an empirical study of repository badges in the npm ecosystem. In: Proceedings of the 40th international conference on software engineering: companion proceeedings. ACM, p 524–526, https://doi.org/10.1145/3183440.3190335
Weyuker EJ, Ostrand TJ, Bell RM (2007) Using developer information as a factor for fault prediction. In: Third International Workshop on Predictor Models in Software Engineering (PROMISE’07: ICSE Workshops 2007). IEEE, p 8–8, https://doi.org/10.1109/PROMISE.2007.14, http://ieeexplore.ieee.org/document/4273264/
Witten B, Landwehr C, Caloyannides M (2001) Does open source improve system security? IEEE Softw. 18(5):57–61
Article Google Scholar
Zahedi M, Ali Babar M, Treude C (2018) An empirical study of security issues posted in open source projects. In: Proceedings of the 51st Hawaii international conference on system sciences
Zhang Y, Lo D, Xia X, Xu B, Sun J, Li S (2015) Combining software metrics and text features for vulnerable file prediction. In: 2015 20th International conference on engineering of complex computer systems (ICECCS). IEEE, pp 40–49
Zhou Y, Sharma A (2017) Automated identification of security issues from commit messages and bug reports. In: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. ACM, pp 914–919
Zimmermann M, Staicu C A, Tenny C, Pradel M (2019) Small world with high risks: A study of security threats in the npm ecosystem. In: 28th {USENIX} Security Symposium ({USENIX} Security 19, pp 995–1010
Zimmermann T, Nagappan N, Williams L (2010) Searching for a needle in a haystack: Predicting security vulnerabilities for windows vista. In: 2010 Third international conference on software testing, verification and validation. IEEE, pp 421–428

Download references

Acknowledgements

This project is supported by the National Research Foundation, Singapore and National University of Singapore through its National Satellite of Excellence in Trustworthy Software Systems (NSOETSS) office under the Trustworthy Computing for Secure Smart Nation Grant (TCSSNG) award no. NSOETSS2020-02. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not reflect the views of National Research Foundation, Singapore and National University of Singapore (including its National Satellite of Excellence in Trustworthy Software Systems (NSOE-TSS) office).

Author information

Authors and Affiliations

Singapore Management University, Singapore, Singapore
Gede Artha Azriadi Prana, Lwin Khin Shar & David Lo
Veracode, Singapore, Singapore
Abhishek Sharma, Andrew E. Santosa & Asankhaya Sharma
National University of Singapore, Singapore, Singapore
Darius Foo

Authors

Gede Artha Azriadi Prana
View author publications
You can also search for this author in PubMed Google Scholar
Abhishek Sharma
View author publications
You can also search for this author in PubMed Google Scholar
Lwin Khin Shar
View author publications
You can also search for this author in PubMed Google Scholar
Darius Foo
View author publications
You can also search for this author in PubMed Google Scholar
Andrew E. Santosa
View author publications
You can also search for this author in PubMed Google Scholar
Asankhaya Sharma
View author publications
You can also search for this author in PubMed Google Scholar
David Lo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gede Artha Azriadi Prana.

Additional information

Communicated by: Alessandro Garcia

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Prana, G.A.A., Sharma, A., Shar, L.K. et al. Out of sight, out of mind? How vulnerable dependencies affect open-source projects. Empir Software Eng 26, 59 (2021). https://doi.org/10.1007/s10664-021-09959-3

Download citation

Accepted: 05 March 2021
Published: 21 April 2021
DOI: https://doi.org/10.1007/s10664-021-09959-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Out of sight, out of mind? How vulnerable dependencies affect open-source projects