Skip to main content
Log in

Out of sight, out of mind? How vulnerable dependencies affect open-source projects

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Context

Software developers often use open-source libraries in their project to improve development speed. However, such libraries may contain security vulnerabilities, and this has resulted in several high-profile incidents in recent years. As usage of open-source libraries grows, understanding of these dependency vulnerabilities becomes increasingly important.

Objective

In this work, we analyze vulnerabilities in open-source libraries used by 450 software projects written in Java, Python, and Ruby. Our goal is to examine types, distribution, severity, and persistence of the vulnerabilities, along with relationships between their prevalence and project as well as commit attributes.

Method

Our data is obtained by scanning versions of the sample projects after each commit made between November 1, 2017 and October 31, 2018 using an industrial software composition analysis tool, which provides information such as library names and versions, dependency types (direct or transitive), and known vulnerabilities.

Results

Among other findings, we found that project activity level, popularity, and developer experience do not translate into better or worse handling of dependency vulnerabilities. We also found “Denial of Service” and “Information Disclosure” types of vulnerabilities being common across the languages studied. Further, we found that most dependency vulnerabilities persist throughout the observation period (mean of 78.4%, 97.7%, and 66.4% for publicly-known vulnerabilities in our Java, Python, and Ruby datasets respectively), and the resolved ones take 3-5 months to fix.

Conclusion

Our results highlight the importance of managing the number of dependencies and performing timely updates, and indicate some areas that can be prioritized to improve security in wide range of projects, such as prevention and mitigation of Denial-of-Service attacks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. https://www.owasp.org/index.php/OWASP_Dependency_Check

  2. https://github.com/rubysec/bundler-audit

  3. http://retirejs.github.io/retire.js/

  4. https://help.github.com/en/articles/about-security-alerts-for-vulnerable-dependencies

  5. https://www.veracode.com/products/software-composition-analysis

  6. https://cve.mitre.org/

  7. https://sca.analysiscenter.veracode.com/vulnerability-database/search

  8. https://nvd.nist.gov/

  9. https://cheatsheetseries.owasp.org/index.html

  10. https://help.veracode.com/reader/hHHR3gv0wYc2WbCclECf_A/EDLOi6PYdFYDvenrK_0vCQ

  11. https://help.github.com/en/github/managing-security-vulnerabilities/about-security-alerts-for-vulnerable-dependencies

References

  • Abdalkareem R, Nourry O, Wehaibi S, Mujahid S, Shihab E (2017) Why do developers use trivial packages? an empirical case study on npm. In: Proceedings of the 2017 11th joint meeting on foundations of software engineering. ACM, pp 385–395

  • Arora A, Telang R (2005) Economics of software vulnerability disclosure. IEEE Security & Privacy 3(1):20–25

    Article  Google Scholar 

  • Bell RM, Ostrand TJ, Weyuker EJ (2013) The limited impact of individual developer data on software defect prediction, vol 18, pp 478–505, DOI https://doi.org/10.1007/s10664-011-9178-4

  • Bosu A, Carver JC, Hafiz M, Hilley P, Janni D (2014) Identifying the characteristics of vulnerable code changes: An empirical study. In: Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering, pp 257–268

  • Cadariu M, Bouwers E, Visser J, van Deursen A (2015) Tracking known security vulnerabilities in proprietary software systems. In: 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER). IEEE, pp 516–519

  • Camilo F, Meneely A, Nagappan M (2015) Do bugs foreshadow vulnerabilities?: a study of the chromium project. In: Proceedings of the 12th Working Conference on Mining Software Repositories. IEEE Press, pp 269–279

  • Dashevskyi S, Brucker AD, Massacci F (2016) On the security cost of using a free and open source component in a proprietary product. In: International Symposium on Engineering Secure Software and Systems. Springer, pp 190–206

  • Decan A, Mens T, Constantinou E (2018) On the impact of security vulnerabilities in the npm package dependency network. In: 2018 IEEE/ACM 15th International conference on mining software repositories (MSR). IEEE, pp 181–191

  • Decan A, Mens T, Grosjean P (2019) An empirical comparison of dependency network evolution in seven software packaging ecosystems. Empir. Softw. Eng. 24(1):381–416

    Article  Google Scholar 

  • Derr E, Bugiel S, Fahl S, Acar Y, Backes M (2017) Keep me updated: An empirical study of third-party library updatability on android. In: Proceedings of the 2017ACM SIGSAC conference on computer and communications security. ACM, pp 2187–2200

  • Devanbu P, Zimmermann T, Bird C (2016) Belief & evidence in empirical software engineering. In: 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE). IEEE, pp 108–119

  • Durumeric Z, Li F, Kasten J, Amann J, Beekman J, Payer M, Weaver N, Adrian D, Paxson V, Bailey M, et al. (2014) The matter of heartbleed. In: Proceedings of the 2014 conference on internet measurement conference. ACM, pp 475–488

  • Fazzini M, Xin Q, Orso A (2019) Automated api-usage update for android apps. In: Proceedings of the 28th ACM SIGSOFT international symposium on software testing and analysis, pp 204–215

  • Foo D, Ang M Y, Yeo J, Sharma A (2018a) Sgl: A domain-specific language for large-scale analysis of open-source code. In: 2018 IEEE Cybersecurity Development (SecDev). IEEE, pp 61–68

  • Foo D, Chua H, Yeo J, Ang MY, Sharma A (2018b) Efficient static checking of library updates. In: Proceedings of the 2018 26th ACM Joint meeting on European software engineering conference and symposium on the foundations of software engineering. ACM, pp 791–796

  • Gardner W, Mulvey E P, Shaw E C (1995) Regression analyses of counts and rates: Poisson, overdispersed poisson, and negative binomial models. Psychol Bull 118(3):392

    Article  Google Scholar 

  • Hilbe JM (2011) Negative binomial regression. Cambridge University Press, Cambridge

  • Hoepman J H, Jacobs B (2007) Increased security through open source. Commun. ACM 50(1):79–83

    Article  Google Scholar 

  • Jezek K, Dietrich J (2017) Api evolution and compatibility: A data corpus and tool evaluation. Journal of Object Technology 16(4):2–1

    Article  Google Scholar 

  • Jimenez M, Papadakis M, Le Traon Y (2016) Vulnerability prediction models: a case study on the Linux Kernel. In: 2016 IEEE 16th International working conference on source code analysis and manipulation (SCAM). IEEE, pp 1–10

  • Kalliamvakou E, Gousios G, Blincoe K, Singer L, German DM, Damian D (2014) The promises and perils of mining github. In: Proceedings of the 11th working conference on mining software repositories. ACM, pp 92–101

  • Kaplan E L, Meier P (1958) Nonparametric estimation from incomplete observations. Journal of the American Statistical Association 53 (282):457–481

    Article  MathSciNet  Google Scholar 

  • Kononenko O, Baysal O, Godfrey MW (2016) Code review quality: how developers see it. In: Proceedings of the 38th international conference on software engineering, pp 1028–1038

  • Kula RG, German DM, Ishio T, Inoue K (2015) Trusting a library: A study of the latency to adopt the latest maven release. In: 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER). https://doi.org/10.1109/SANER.2015.7081869, http://ieeexplore.ieee.org/document/7081869/. IEEE, pp 520–524

  • Kula R G, German D M, Ouni A, Ishio T, Inoue K (2018) Do developers update their library dependencies? Empir. Softw. Eng. 23(1):384–417

    Article  Google Scholar 

  • Lamothe M, Shang W, Chen TH (2018) A4: Automatically assisting android api migrations using code examples. arXiv:181204894

  • Lauinger T, Chaabane A, Wilson CB (2018) Thou shalt not depend on me. Commun ACM 61(6):41–47. https://doi.org/10.1145/3190562, http://doi.acm.org.libproxy.smu.edu.sg/10.1145/3190562

    Article  Google Scholar 

  • Li J, Conradi R, Bunse C, Torchiano M, Slyngstad O P N, Morisio M (2009) Development with off-the-shelf components: 10 facts. IEEE Softw 26(2):80–87

    Article  Google Scholar 

  • McCabe T J (1976) A complexity measure. IEEE Transactions on software Engineering (4):308–320

  • Meneely A, Williams L (2009) Secure open source collaboration: an empirical study of Linus’ Law. In: Proceedings of the 16th ACM conference on Computer and communications security, pp 453–462

  • Meneely A, Williams L (2010) Strengthening the empirical analysis of the relationship between linus’ law and software security. In: Proceedings of the 2010 ACM-IEEE international symposium on empirical software engineering and measurement, pp 1–10

  • Meneely A, Srinivasan H, Musa A, Tejeda AR, Mokary M, Spates B (2013) When a patch goes bad: Exploring the properties of vulnerability-contributing commits. In: 2013 ACM/IEEE international symposium on empirical software engineering and measurement. IEEE, pp 65–74

  • Meng N, Nagy S, Yao D, Zhuang W, Arango-Argoty G (2018) Secure coding practices in java: Challenges and vulnerabilities, IEEE

  • Mezzetti G, Møller A, Torp MT (2018) Type regression testing to detect breaking changes in node. js libraries. In: 32nd European Conference on Object-Oriented Programming (ECOOP 2018), Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik

  • Møller A, Torp MT (2019) Model-based testing of breaking changes in node. js libraries. In: Proceedings of the 2019 27th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, pp 409–419

  • Munaiah N, Kroh S, Cabrey C, Nagappan M (2017) Curating github for engineered software projects. Empir. Softw. Eng. 22(6):3219–3253

    Article  Google Scholar 

  • Nagappan N, Ball T (2005) Use of relative code churn measures to predict system defect density. In: Proceedings of the 27th international conference on Software engineering. ACM, pp 284–292

  • Neuhaus S, Zimmermann T, Holler C, Zeller A (2007) Predicting vulnerable software components. In: ACM Conference on computer and communications security. Citeseer, pp 529–540

  • Ostrand TJ, Weyuker EJ, Bell RM (2010) Programmer-based fault prediction. In: Proceedings of the 6th international conference on predictive models in software engineering - PROMISE ’10. https://doi.org/10.1145/1868328.1868357. http://portal.acm.org/citation.cfm?doid=1868328.1868357. ACM Press, p 1

  • Ozment A, Schechter SE (2006) Milk or wine: does software security improve with age? In: USENIX Security Symposium, pp 93–104

  • Pashchenko I, Plate H, Ponta SE, Sabetta A, Massacci F (2018) Vulnerable open source dependencies: counting those that matter. In: Proceedings of the 12th ACM/IEEE international symposium on empirical software engineering and measurement. ACM, p 42

  • Perl H, Dechand S, Smith M, Arp D, Yamaguchi F, Rieck K, Fahl S, Acar Y (2015) Vccfinder: Finding potential vulnerabilities in open-source projects to assist code audits. In: Proceedings of the 22nd ACM SIGSAC conference on computer and communications security, pp 426–437

  • Rahman A, Farhana E, Imtiaz N (2019) Snakes in paradise?: insecure python-related coding practices in stack overflow. In: Proceedings of the 16th international conference on mining software repositories. IEEE Press, pp 200–204

  • Ray B, Posnett D, Filkov V, Devanbu P (2014) A large scale study of programming languages and code quality in github. In: Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering, pp 155–165

  • Raymond E (1999) The cathedral and the bazaar. Knowledge, Technology & Policy 12(3):23–49

    Article  Google Scholar 

  • Seabold S, Perktold J (2010) statsmodels: Econometric and statistical modeling with python. In: 9th Python in Science Conference

  • Shahzad M, Shafiq M Z, Liu AX (2012) A large scale exploratory analysis of software vulnerability life cycles. In: 2012 34th International Conference on Software Engineering (ICSE). IEEE, pp 771– 781

  • Shin Y, Williams L (2011) An initial study on the use of execution complexity metrics as indicators of software vulnerabilities. In: Proceedings of the 7th international workshop on software engineering for secure systems, pp 1–7

  • Shin Y, Williams L (2013) Can traditional fault prediction models be used for vulnerability prediction? Empir. Softw. Eng. 18(1):25–59

    Article  Google Scholar 

  • Shin Y, Meneely A, Williams L, Osborne J A (2010) Evaluating complexity, code churn, and developer activity metrics as indicators of software vulnerabilities. IEEE Trans Softw Eng 37(6):772–787

    Article  Google Scholar 

  • Spadini D, Aniche M, Bacchelli A (2018) Pydriller: Python framework for mining software repositories. In: Proceedings of the 2018 26th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering. ACM, pp 908–911

  • Spearman C (1904) The proof and measurement of association between two things. Am J Psychol 15(1):72–101

    Article  Google Scholar 

  • Thung F, Haryono S A, Serrano L, Muller G, Lawall J, Lo D, Jiang L (2020) Automated deprecated-api usage update for android apps: How far are we?. In: 2020 IEEE 27th international conference on software analysis, evolution and reengineering (SANER). IEEE, pp 602–611

  • Trockman A (2018) Adding sparkle to social coding: an empirical study of repository badges in the npm ecosystem. In: Proceedings of the 40th international conference on software engineering: companion proceeedings. ACM, p 524–526, https://doi.org/10.1145/3183440.3190335

  • Weyuker EJ, Ostrand TJ, Bell RM (2007) Using developer information as a factor for fault prediction. In: Third International Workshop on Predictor Models in Software Engineering (PROMISE’07: ICSE Workshops 2007). IEEE, p 8–8, https://doi.org/10.1109/PROMISE.2007.14, http://ieeexplore.ieee.org/document/4273264/

  • Witten B, Landwehr C, Caloyannides M (2001) Does open source improve system security? IEEE Softw. 18(5):57–61

    Article  Google Scholar 

  • Zahedi M, Ali Babar M, Treude C (2018) An empirical study of security issues posted in open source projects. In: Proceedings of the 51st Hawaii international conference on system sciences

  • Zhang Y, Lo D, Xia X, Xu B, Sun J, Li S (2015) Combining software metrics and text features for vulnerable file prediction. In: 2015 20th International conference on engineering of complex computer systems (ICECCS). IEEE, pp 40–49

  • Zhou Y, Sharma A (2017) Automated identification of security issues from commit messages and bug reports. In: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. ACM, pp 914–919

  • Zimmermann M, Staicu C A, Tenny C, Pradel M (2019) Small world with high risks: A study of security threats in the npm ecosystem. In: 28th {USENIX} Security Symposium ({USENIX} Security 19, pp 995–1010

  • Zimmermann T, Nagappan N, Williams L (2010) Searching for a needle in a haystack: Predicting security vulnerabilities for windows vista. In: 2010 Third international conference on software testing, verification and validation. IEEE, pp 421–428

Download references

Acknowledgements

This project is supported by the National Research Foundation, Singapore and National University of Singapore through its National Satellite of Excellence in Trustworthy Software Systems (NSOETSS) office under the Trustworthy Computing for Secure Smart Nation Grant (TCSSNG) award no. NSOETSS2020-02. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not reflect the views of National Research Foundation, Singapore and National University of Singapore (including its National Satellite of Excellence in Trustworthy Software Systems (NSOE-TSS) office).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gede Artha Azriadi Prana.

Additional information

Communicated by: Alessandro Garcia

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Prana, G.A.A., Sharma, A., Shar, L.K. et al. Out of sight, out of mind? How vulnerable dependencies affect open-source projects. Empir Software Eng 26, 59 (2021). https://doi.org/10.1007/s10664-021-09959-3

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10664-021-09959-3

Keywords

Navigation