Skip to main content
Log in

Password guessers under a microscope: an in-depth analysis to inform deployments

  • Regular contribution
  • Published:
International Journal of Information Security Aims and scope Submit manuscript

Abstract

Password guessers are instrumental for assessing the strength of passwords. Despite their diversity and abundance, comparisons between password guessers are limited to simple success rates. Thus, little is known on how password guessers can best be combined with or complement each other. To extend analyses beyond success rates, we devise an analytical framework to compare the types of passwords that guessers generate. Using our framework, we show that different guessers often produce dissimilar passwords, even when trained on the same data. We leverage this result to show that combinations of computationally cheap guessers are as effective in guessing passwords as computationally intensive guessers, but more efficient. Our framework can be used to identify combinations of guessers that will best complement each other. To improve the success rate of any guesser, we also show how an effective training dataset can be identified for a given target password dataset, even when the target dataset is hashed. Our insights allow us to provide a concrete set of practical recommendations for password checking to effectively and efficiently measure password strength.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. We use the terminology of “testing against a dataset” when a guesser is guessing the passwords of a target password dataset.

  2. The function \(\mathbbm {1}[s]\) returns 1 if the statement s is true; otherwise 0.

  3. We exclusively use publicly available datasets and don’t report any specific password information. Thus, there is no risk of exposing private user information. We keep only the passwords with no links to their original owner.

  4. The upperbound for number of guesses in the Identity guesser is derived from the maximum number of unique passwords in our datasets.

  5. Our code for training the identity guesser (i.e., computing empirical distribution of unique passwords) and its guess generation (i.e., sorting passwords based on their probabilities) is written in Python without any optimization.

  6. We train on Twitter for this purpose, as opposed to the Merged dataset, since the Merged dataset would contain the testing (target) data.

  7. The generalized Jaccard allows us to weight the successful guesses of each guesser based on their frequencies in the target dataset.

  8. One might think that JtR-Markov might outperform others with additional guesses. However, our further tests show that even after 20 billion guesses, JtR-Markov only reaches a success rate of 50.568%.

  9. These results confirm and complement previous findings [30] by employing different features, more and larger datasets, and more password guessers. We also show how similarity can be measured between a hashed & salted target dataset and a plaintext candidate training set.

References

  1. Baeza-Yates, R.A., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley Longman Publishing Co., Inc, Boston, MA, USA (1999)

    Google Scholar 

  2. Berkhin, P.: Survey of clustering data mining techniques. In: Grouping multidimensional data, pp. 25–71 (2006)

  3. Bishop, M., Klein, D.V.: Improving system security via proactive password checking. Comput. Secur. 14(3), 233–249 (1995)

    Article  Google Scholar 

  4. Bonneau, J.: The science of guessing: analyzing an anonymized corpus of 70 million passwords. In: Proceedings of the 2012 IEEE symposium on security and privacy (S&P), pp. 538–552 (2012)

  5. Bonneau, J., Herley, C., van Oorschot, P.C., Stajano, F.: The quest to replace passwords: a framework for comparative evaluation of web authentication schemes. In: Proceedings of the 2012 IEEE symposium on security and privacy (S&P), pp. 553–567 (2012)

  6. Campbell, J., Ma, W., Kleeman, D.: Impact of restrictive composition policy on user password choices. Behav. Inf. Technol. 30(3), 379–388 (2011)

    Article  Google Scholar 

  7. Castelluccia, C., Dürmuth, M., Perito, D.: Adaptive password-strength meters from markov models. In: Proceedings of the 2012 network and distributed system security symposium (NDSS) (2012)

  8. Cubrilovic, N.: Rockyou hack: From bad to worse—techcrunch (2009). https://techcrunch.com/2009/12/14/rockyou-hack-security-myspace-facebook-passwords/

  9. Das, A., Bonneau, J., Caesar, M., Borisov, N., Wang, X.: The tangled web of password reuse. In: Proceedings of the 2014 network and distributed system security symposium (NDSS), pp. 23–26 (2014)

  10. Das, S.: 40 million fling.com users’ passwords, sexual preferences stolen \(|\) hacked: Hacking finance (2016). https://hacked.com/40-million-fling-com-users-passwords-sexual-preferences-stolen/

  11. Databases today: twitter.7z (2019). https://databases.today/search-nojs.php

  12. de Carné de Carnavalet, X., Mannan, M.: From very weak to very strong: Analyzing password-strength meters. In: Proceedings of the 2014 network and distributed system security symposium (NDSS), pp. 23–26 (2014)

  13. Dell’Amico, M., Filippone, M.: Monte carlo strength evaluation: fast and reliable password checking. In: Proceedings of the 22nd ACM SIGSAC conference on computer and communications security, pp. 158–169 (2015)

  14. Designer, S.: John the ripper password cracker (2002). https://www.openwall.com/john/

  15. Dunham, M.H.: Data Mining: Introductory and Advanced Topics. Prentice Hall PTR, Upper Saddle River, NJ, USA (2002)

    Google Scholar 

  16. Dürmuth, M., Angelstorf, F., Castelluccia, C., Perito, D., Chaabane, A.: OMEN: Faster password guessing using an ordered markov enumerator. In: Proceedings of the international symposium on engineering secure software and systems, pp. 119–132 (2015)

  17. Florencio, D., Herley, C.: A large-scale study of web password habits. In: Proceedings of the 16th international conference on World Wide Web (WWW), pp. 657–666 (2007)

  18. Florêncio, D., Herley, C.: Where do security policies come from? In: Proceedings of the Sixth symposium on usable privacy and security (SOUPS), pp. 10:1–10:14 (2010)

  19. Florêncio, D., Herley, C., Van Oorschot, P.C.: Pushing on string: The don‘t care region of password strength. Commun. ACM 59(11), 66–74 (2016)

    Article  Google Scholar 

  20. Fox-Brewster, T.: 13 million passwords appear to have leaked from this free web host (2017). https://www.forbes.com/sites/thomasbrewster/2015/10/28/000webhost-database-leak/

  21. Frakes, W.B., Baeza-Yates, R. (eds.): Information Retrieval: Data Structures and Algorithms. Prentice-Hall Inc, Upper Saddle River, NJ, USA (1992)

    Google Scholar 

  22. Furnell, S.: Assessing password guidance and enforcement on leading websites. Comput. Fraud Secur. 2011(12), 10–18 (2011)

    Article  Google Scholar 

  23. Golla, M., Dürmuth, M.: On the accuracy of password strength meters. In: Proceedings of ACM CCS, pp. 1567–1582 (2018)

  24. Goodin, D.: 6.6 million plaintext passwords exposed as site gets hacked to the bone (2016). https://arstechnica.com/information-technology/2016/09/plaintext-passwords-and-wealth-of-other-data-for-6-6-million-people-go-public/

  25. Hackett, R.: Linkedin lost 167 million account credentials in data breach (2016). http://fortune.com/2016/05/18/linkedin-data-breach-email-password/

  26. Hitaj, B., Gasti, P., Ateniese, G., Perez-Cruz, F.: Passgan: A deep learning approach for password guessing. In: Applied Cryptography and Network Security, pp. 217–237. Springer International Publishing (2019)

  27. Houshmand, S., Aggarwal, S., Flood, R.: Next gen pcfg password cracking. IEEE Trans. Inf. Foren. Secur. 10(8), 1776–1791 (2015)

    Article  Google Scholar 

  28. Inglesant, P.G., Sasse, M.A.: The true cost of unusable password policies. In: Proceedings of the 2010 conference on human factors in computing systems (CHI), pp. 383–392 (2010)

  29. Jakobsson, M., Dhiman, M.: The benefits of understanding passwords. In: Mobile Authentication, pp. 5–24. Springer (2013)

  30. Ji, S., Yang, S., Das, A., Hu, X., Beyah, R.: Password correlation: Quantification, evaluation and application. In: Proceedings of the IEEE conference on computer communications, pp. 1–9 (2017)

  31. Ji, S., Yang, S., Hu, X., Han, W., Li, Z., Beyah, R.: Zero-sum password cracking game: a large-scale empirical study on the crackability, correlation, and security of passwords. IEEE Trans. Dependable Secure Comput. 14(5), 550–564 (2017)

    Article  Google Scholar 

  32. Kelley, P.G., Komanduri, S., Mazurek, M.L., Shay, R., Vidas, T., Bauer, L., Christin, N., Cranor, L.F., Julio, L.: Guess again (and again and again): Measuring password strength by simulating password-cracking algorithms. In: Proceedings of the 2012 IEEE symposium on security and privacy (S&P), pp. 523–537 (2012)

  33. Komanduri, S., Shay, R., Kelley, P.G., Mazurek, M.L., Bauer, L., Christin, N., Cranor, L.F., Egelman, S.: Of passwords and people: Measuring the effect of password-composition policies. In: Proceedings of the 2011 conference on human factors in computing systems (CHI), pp. 2595–2604 (2011)

  34. Malone, D., Maher, K.: Investigating the distribution of password choices. In: Proceedings of the 21st international conference on World Wide Web (WWW), pp. 301–310 (2012)

  35. Mazurek, M.L., Komanduri, S., Vidas, T., Bauer, L., Christin, N., Cranor, L.F., Kelley, P.G., Shay, R., Ur, B.: Measuring password guessability for an entire university. In: Proceedings of the 2013 ACM SIGSAC conference on computer & communications security (CCS), pp. 173–186 (2013)

  36. Melicher, W.: The neural network password meter (2019). https://github.com/cupslab/neural_network_cracking

  37. Melicher, W., Ur, B., Segreti, S.M., Komanduri, S., Bauer, L., Christin, N., Cranor, L.F.: Fast, lean, and accurate: Modeling password guessability using neural networks. In: Proceedings of the 25th USENIX security symposium, pp. 175–191 (2016)

  38. Narayanan, A., Shmatikov, V.: Fast dictionary attacks on passwords using time-space tradeoff. In: Proceedings of the 2005 ACM SIGSAC conference on computer and communications security (CCS), pp. 364–372 (2005)

  39. Pal, B., Daniel, T., Chatterjee, R., Ristenpart, T.: Beyond credential stuffing: Password similarity models using neural networks. In: IEEE Symposium on security and privacy, pp. 417–434 (2019)

  40. Peslyak, A.: John the ripper community build (1.9.0-bleeding-jumbo) (2019). https://github.com/magnumripper/JohnTheRipper

  41. Ruhr University Bochum, RUB-SysSec: OMEN: Ordered markov enumerator (2019). https://github.com/RUB-SysSec/OMEN

  42. Russon, M.A.: Mate1.com hack: 27 million account passwords and emails have been leaked and sold on dark web (2016). https://www.ibtimes.co.uk/mate1-com-hack-27-million-account-passwords-emails-have-been-leaked-sold-dark-web-1547166

  43. Schweitzer, D., Boleng, J., Hughes, C., Murphy, L.: Visualizing keyboard pattern passwords. Inf. Vis. 10(2), 127–133 (2011)

    Article  Google Scholar 

  44. Singhal, A.: Modern information retrieval: a brief overview. Bull. IEEE Comput. Soc. Tech. Committee Data Eng. 24(4), 35–43 (2001)

    Google Scholar 

  45. Summers, W.C., Bosworth, E.: Password policy: The good, the bad, and the ugly. In: Proceedings of the winter international synposium on information and communication technologies, pp. 1–6 (2004)

  46. Thomas, K., Moscicki, A., Margolis, D., Paxson, V., Bursztein, E., Li, F., Zand, A., Barrett, J., Ranieri, J., Invernizzi, L., Markov, Y., Comanescu, O., Eranti, V.: Data breaches, phishing, or malware?: Understanding the risks of stolen credentials. In: Proceedings of the 2017 ACM SIGSAC conference on computer and communications security (CCS), pp. 1421–1434 (2017)

  47. Ur, B., Habib, H., Johnson, N., Melicher, W., Alfieri, F., Aung, M., Bauer, L., Christin, N., Colnago, J., Cranor, L.F., Dixon, H., Emami Naeini, P.: Design and evaluation of a data-driven password meter. In: Proceedings of the 2017 conference on human factors in computing systems (CHI), pp. 3775–3786 (2017)

  48. Ur, B., Kelley, P.G., Komanduri, S., Lee, J., Maass, M., Mazurek, M.L., Passaro, T., Shay, R., Vidas, T., Bauer, L., Christin, N., Cranor, L.F.: How does your password measure up? the effect of strength meters on password creation. In: Proceedings of the 21st USENIX Security Symposium, pp. 65–80 (2012)

  49. Ur, B., Segreti, S.M., Bauer, L., Christin, N., Cranor, L.F., Komanduri, S., Kurilova, D., Mazurek, M.L., Melicher, W., Shay, R.: Measuring real-world accuracies and biases in modeling password guessability. In: Proceedings of the 24th USENIX security symposium, pp. 463–481 (2015)

  50. Veras, R.: Semantic password guesser (lite) (2019). https://github.com/vialab/semantic-guesser/tree/lite

  51. Veras, R., Collins, C., Thorpe, J.: On the semantic patterns of passwords and their security impact. In: Proceedings 2014 Network and distributed system security symposium (NDSS), pp. 23–26 (2014)

  52. Veras, R., Thorpe, J., Collins, C.: Visualizing semantics in passwords: the role of dates. In: Proceedings of the ninth international symposium on visualization for cyber security, pp. 88–95 (2012)

  53. Wang, D., Zhang, Z., Wang, P., Yan, J., Huang, X.: Targeted online password guessing: An underestimated threat. In: Proceedings of the 2016 ACM SIGSAC Conference on computer and communications security (CCS), pp. 1242–1254 (2016)

  54. Wei, M., Golla, M.: The password doesn’t fall far: How service influences password choice. In: Proceedings of the 2018 Who Are You?! Adventures in authentication workshop (2018)

  55. Weir, C.M.: Pretty cool fuzzy guesser (4.0) (2019). https://github.com/lakiw/pcfg_cracker

  56. Weir, M., Aggarwal, S., Collins, M., Stern, H.: Testing metrics for password creation policies by attacking large sets of revealed passwords. In: Proceedings of the 2010 ACM SIGSAC conference on computer and communications security (CCS), pp. 162–175 (2010)

  57. Weir, M., Aggarwal, S., De Medeiros, B., Glodek, B.: Password cracking using probabilistic context-free grammars. In: Proceedings of the 2009 IEEE symposium on security and privacy (S&P), pp. 391–405 (2009)

  58. Wheeler, D.L.: zxcvbn: Low-budget password strength estimation. In: Proceedings of the 25th USENIX security symposium, pp. 157–173 (2016)

  59. Zhou, H., Liu, Q., Zhang, F.: Poster: An analysis of targeted password guessing using neural networks. In: Proceedings of the 2017 IEEE Symposium on security and privacy (S&P) (2017)

Download references

Funding

This research was supported by Natural Sciences and Engineering Research Council of Canada (NSERC).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zach Parish.

Ethics declarations

Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

1.1 Appendix A: preliminary analyses and visualization

Figure 5 captures the average success rates for various pairs of training and testing datasets. One can make two important observations: (1) some datasets (e.g., Twitter, Mate1) are more effective as training data than others (e.g., Webhost); (2) some pairs of datasets are effective for training and testing against each other, i.e., when one dataset can train guessers well against another dataset (e.g., RockYou-Mate1, ClixSense-Mate1, etc.). These two observations motivate us towards a deeper analysis of the characteristics of effective training datasets, discussed in Sect. 4.2.

1.2 Appendix B: proofs

Proof of Proposition 1

By Lemma 1 and Lemma 2, the generalized Jaccard index between the password list A and unhashed password list B (which is not accessible) can be computed by:

$$\begin{aligned} J(A,B)= \frac{\sum \limits _{{w \in {{\,\mathrm{supp}\,}}{A}}} \mathrm{min}\left( o(w,A), o(w, B)\right) }{|A| + |B|- \sum \limits _{{w \in {{\,\mathrm{supp}\,}}{A}}} \mathrm{min}\left( o(w,A), o(w, B)\right) }, \nonumber \\ \end{aligned}$$
(13)

Defining \(g(w, B_h) = \sum _{y\in B_h}\mathbbm {1}[y = H(w+s_y)]\) for counting the number of occurrences of password w in the salted & hashed password list \(B_h\), we note that \(o(w,B)=g(w,B_h)\) and |B| = \(|B_h|\). So Eq. 13 is equivalent to:

$$\begin{aligned} J(A,B_h)= \frac{\sum \limits _{{w \in {{\,\mathrm{supp}\,}}{A}}} \mathrm{min}\left( o(w,A), g(w, B_h)\right) }{|A| + |B_h|- \sum \limits _{{w \in \varOmega \left( A\right) }} \mathrm{min}\left( o(w,A), g(w, B_h)\right) }. \end{aligned}$$

Letting \(F_{\!{\mathrm{min}}}(A,B_h) = \sum \limits _{{w \in {{\,\mathrm{supp}\,}}{A}}} \mathrm{min}\left( o(w,A), g(w, B_h)\right) \), we derive Eq. 12. \(\square \)

Lemma 1

Let o(wA) and o(wb) be the number of occurrences of password w in password lists A and B, respectively. We have

$$\begin{aligned}&\sum \limits _{\qquad \quad {w \in \left( \varOmega \left( A\right) \cup \varOmega \left( B\right) \right) }} \mathrm{min}\left( o(w,A), o(w, B)\right) =|A| + |B| \\&\quad - \sum \limits _{{w \in \left( \varOmega \left( A\right) \cup \varOmega \left( B\right) \right) }} \mathrm{max}\left( o(w,A), o(w, B)\right) . \end{aligned}$$

Here \(|A|= \sum _{w \in \varOmega \left( A\right) } o(w,A)\) and \(|B| = \sum _{w \in \varOmega \left( B\right) } o(w,B)\) are the number of passwords in A and B respectively. Also, \(\varOmega \left( A\right) \) is the set of unique passwords in A.

Proof

One can observe that for any two numbers a and b: \(min\left( a, b\right) +max\left( a, b\right) = a + b.\) Using this equality, we can derive

$$\begin{aligned}&\sum \limits _{{w \in \left( \varOmega \left( A\right) \cup \varOmega \left( B\right) \right) }} \Big [\mathrm{min}\left( o(w,A), o(w,B)\right) \\&\quad + \mathrm{max}\left( o(w,A), o(w,B)\right) \Big ]\\&\quad =\sum \limits _{{w \in \left( \varOmega \left( A\right) \cup \varOmega \left( B\right) \right) }} o(w,A) + o(w, B) \\&\quad =\sum \limits _{{w \in \left( \varOmega \left( A\right) \cup \varOmega \left( B\right) \right) }\quad } o(w,A)\\&\quad + \sum \limits _{\quad {w \in \left( \varOmega \left( A\right) \cup \varOmega \left( B\right) \right) }} o(w, B)\\&\quad =\sum \limits _{{w \in \varOmega \left( A\right) }} o(w,A)\\&\quad + \sum \limits _{w \in \varOmega \left( B\right) } o(w, B). \end{aligned}$$

The last equality holds as \(o(w,A)=0\) when \(w \notin A\) and \(o(w,B)=0\) when \(w \notin B\). By decomposing the first summation, we have shown

$$\begin{aligned}&\sum \limits _{\qquad \quad {w \in \left( \varOmega \left( A\right) \cup \varOmega \left( B\right) \right) }} \mathrm{min}\left( o(w,A), o(w, B)\right) \\&\quad + \sum \limits _{{w \in \left( \varOmega \left( A\right) \cup \varOmega \left( B\right) \right) }} \mathrm{max}\left( o(w,A), o(w, B)\right) =|A| + |B|, \end{aligned}$$

where \(|A|= \sum _{w \in \varOmega \left( A\right) } o(w,A)\) and \(|B| = \sum _{w \in \varOmega \left( B\right) } o(w,B)\). By rearranging the terms of this equality, we derive

$$\begin{aligned}&\sum \limits _{\qquad \quad {w \in \left( \varOmega \left( A\right) \cup \varOmega \left( B\right) \right) }} \mathrm{min}\left( o(w,A), o(w, B)\right) =|A| + |B| \\&\quad - \sum \limits _{{w \in \left( \varOmega \left( A\right) \cup \varOmega \left( B\right) \right) }} \mathrm{max}\left( o(w,A), o(w, B)\right) . \end{aligned}$$

\(\square \)

Lemma 2

Letting o(wA) and o(wb) be the number of occurrences of password w in password lists A and B, respectively,

$$\begin{aligned}&\sum _{\qquad \quad {w \in \varOmega \left( A\right) \cup \varOmega \left( B\right) }} \mathrm{min}\left( o(w,A), o(w, B)\right) \nonumber \\&\quad =\sum \limits _{{w \in {{\,\mathrm{supp}\,}}{A}}} \mathrm{min}\left( o(w,A), o(w, B)\right) ,&\end{aligned}$$
(14)

where \(\varOmega \left( A\right) \) is the set of unique passwords in A.

Proof

Partitioning \(\varOmega \left( A\right) \cup \varOmega \left( B\right) \) to two disjoint sets of \(\varOmega \left( A\right) \) and \(\varOmega \left( B\right) -\varOmega \left( A\right) \), we have

$$\begin{aligned}&\sum _{{w \in \varOmega \left( A\right) \cup \varOmega \left( B\right) }} \mathrm{min}\left( o(w,A), o(w, B)\right) \\&\quad =\sum \limits _{{w \in \varOmega \left( A\right) }} \mathrm{min}\left( o(w,A), o(w, B)\right) \\&\qquad + \sum \limits _{{w \in \varOmega \left( B\right) -\varOmega \left( A\right) }} \mathrm{min}\left( o(w,A), o(w, B)\right) . \end{aligned}$$

As \(o(w,A)=0\) for \(w \in \varOmega \left( B\right) -\varOmega \left( A\right) \), we have \(min\left( o(w,A), o(w, B)\right) =0\) for all \(w \in \varOmega \left( B\right) -\varOmega \left( A\right) \). So we have derived

$$\begin{aligned}&\sum _{{w \in \varOmega \left( A\right) \cup \varOmega \left( B\right) }} \mathrm{min}\left( o(w,A), o(w, B)\right) \\&\quad =\sum \limits _{{w \in \varOmega \left( A\right) }} \mathrm{min}\left( o(w,A), o(w, B)\right) . \end{aligned}$$

\(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Parish, Z., Cushing, C., Aggarwal, S. et al. Password guessers under a microscope: an in-depth analysis to inform deployments. Int. J. Inf. Secur. 21, 409–425 (2022). https://doi.org/10.1007/s10207-021-00560-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10207-021-00560-9

Keywords

Navigation