From RockYou to RockYou2024: Analyzing Password Patterns Across Generations, Their Use in Industrial Systems and Vulnerability to Password Guessing Attacks

Gabriel Arquelau Pimenta Rodrigues; Pedro Augusto Giacomelli Fernandes; André Luiz Marques Serrano; Geraldo Pereira Rocha Filho; Guilherme Fay Vergara; Guilherme Dantas Bispo; Robson de Oliveira Albuquerque; Vinícius Pereira Gonçalves

doi:10.5753/jisa.2025.5034

Authors

Gabriel Arquelau Pimenta Rodrigues University of Brasilia https://orcid.org/0000-0002-4502-2153
Pedro Augusto Giacomelli Fernandes University of Brasilia https://orcid.org/0009-0009-2657-6976
André Luiz Marques Serrano University of Brasilia https://orcid.org/0000-0001-5182-0496
Geraldo Pereira Rocha Filho State University of Southwest Bahia https://orcid.org/0000-0001-6795-2768
Guilherme Fay Vergara University of Brasilia https://orcid.org/0000-0002-4551-2240
Guilherme Dantas Bispo University of Brasilia https://orcid.org/0000-0002-4938-2076
Robson de Oliveira Albuquerque University of Brasilia https://orcid.org/0000-0002-6717-3374
Vinícius Pereira Gonçalves University of Brasilia https://orcid.org/0000-0002-3771-2605

DOI:

https://doi.org/10.5753/jisa.2025.5034

Keywords:

Authentication, autoencoder, cybersecurity, data breach, password, privacy, and RockYou

Abstract

Passwords are a common user authentication method, and must be safeguarded by effective security measures. However, there are many cases of compromised user credentials in data breaches. This work studies RockYou2024, a massive data breach that occurred in July 2024 and exposed over 9 billion passwords. We investigate the passwords with regard to their lengths, entropy, use of personal information and common strings, and evaluation from zxcvbn, as well as making a comparative assessment of the results with previous password databases, namely RockYou2021 and RockYou, which was leaked in 2009. This analysis found that the passwords from RockYou2021 and RockYou2024 are significantly more secure than those from RockYou, which suggests an improvement in password creation awareness and policies. It was also noted that RockYou2021 and RockYou2024 have similar statistical distributions in all the analyses conducted. We have also found that the country of origin for most passwords within these databases is most likely to be the United States of America. These datasets were searched for passwords that are often used in industrial systems, which pose potential security risks in critical infrastructure sectors. Finally, we also propose passBiRVAE, a contextualized Bidirectional Recurrent Neural Network , used to generate passwords based on the RockYou2024 database. Future works should make further improvements to the results obtained from this model. However, there is a risk of threats to the validity of these analyses.

Downloads

Download data is not yet available.

References

Alroomi, S. and Li, F. (2023). Measuring website password creation policies at scale. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, pages 3108-3122. DOI: 10.1145/3576915.3623156.

AlSabah, M., Oligeri, G., and Riley, R. (2018). Your culture is in your password: An analysis of a demographically-diverse password dataset. Computers & security, 77:427-441. DOI: 10.1016/j.cose.2018.03.014.

Belqruch, A. and Maach, A. (2019). Scada security using ssh honeypot. In Proceedings of the 2nd International Conference on Networking, Information Systems & Security, pages 1-5. DOI: 10.1145/3320326.3320328.

Bichara, M. d. A., dos Reis, M. A., Marcondes, M. R., da Silva Eleutério, P. M., and Vieira, V. H. (2023). Forensic method for decrypting tpm-protected bitlocker volumes using intel dci. Forensic Science International: Digital Investigation, 44:301514. DOI: 10.1016/j.fsidi.2023.301514.

Biesner, D., Cvejoski, K., Georgiev, B., Sifa, R., and Krupicka, E. (2021). Advances in password recovery using generative deep learning techniques. In Artificial Neural Networks and Machine Learning-ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14-17, 2021, Proceedings, Part III 30, pages 15-27. Springer. DOI: 10.1007/978-3-030-86365-4_2.

Biesner, D., Cvejoski, K., and Sifa, R. (2022). Combining variational autoencoders and transformer language models for improved password generation. In In Proceedings of the 17th International Conference on Availability, Reliability and Security. DOI: 10.1145/3538969.3539000.

Bispo, G. D., Vergara, G. F., Saiki, G. M., Martins, P. H. d. S., Coelho, J. G., Rodrigues, G. A. P., Oliveira, M. N. d., Mosquéra, L. R., Gonçalves, V. P., Neumann, C., and Serrano, A. L. M. (2024). Automatic literature mapping selection: Classification of papers on industry productivity. Applied Sciences, 14(9). DOI: 10.3390/app14093679.

Bojinov, H., Bursztein, E., Boyen, X., and Boneh, D. (2010). Kamouflage: Loss-resistant password management. In Computer Security-ESORICS 2010: 15th European Symposium on Research in Computer Security, Athens, Greece, September 20-22, 2010. Proceedings 15, pages 286-302. Springer. DOI: 10.1007/978-3-642-15497-3_18.

Bowman, S. R., Vilnis, L., Vinyals, O., Dai, A. M., Jozefowicz, R., and Bengio, S. (2015). Generating sentences from a continuous space. arXiv preprint arXiv:1511.06349. DOI: 10.48550/arXiv.1511.06349.

Chen, D., Chen, X., Li, H., Xie, J., and Mu, Y. (2019). Deepcpdp: Deep learning based cross-project defect prediction. IEEE Access, 7:184832-184848. DOI: 10.1109/access.2019.2961129.

Dastane, D. O. (2020). The effect of bad password habits on personal data breach. International Journal of Emerging Trends in Engineering Research, 8(10). DOI: 10.30534/ijeter/2020/538102020.

Dubey, R. and Martin, M. V. (2021). Fool me once: A study of password selection evolution over the past decade. In 2021 18th International Conference on Privacy, Security and Trust (PST), pages 1-7. IEEE. DOI: 10.1109/PST52912.2021.9647823.

Grilo, M., Campos, J., Ferreira, J. F., Almeida, J. B., and Mendes, A. (2022). Verified password generation from password composition policies. In International Conference on Integrated Formal Methods, pages 271-288. Springer. DOI: 10.1007/978-3-031-07727-2_15.

Grobler, M., Chamikara, M., Abbott, J., Jeong, J. J., Nepal, S., and Paris, C. (2021). The importance of social identity on password formulations. Personal and Ubiquitous Computing, 25(5):813-827. DOI: 10.1007/s00779-020-01477-1.

Higgins, I., Matthey, L., Pal, A., Burgess, C. P., Glorot, X., and Botvinick, M. M. (2017). beta-vae: Learning basic visual concepts with a constrained variational framework. In ICLR. Available at:[link].

Imamaliyev, A. and Khudoykulov, Z. (2021). Analysis password-based authentication systems with password policy. In 2021 International Conference on Information Science and Communications Technologies (ICISCT), pages 1-3. DOI: 10.1109/ICISCT52966.2021.9670312.

Jiang, J., Zhou, A., Liu, L., and Zhang, L. (2022). Omecdn: A password-generation model based on an ordered markov enumerator and critic discriminant network. Applied Sciences, 12(23). DOI: 10.3390/app122312379.

Kanta, A., Coisel, I., and Scanlon, M. (2024). A comprehensive evaluation on the benefits of context based password cracking for digital forensics. Journal of Information Security and Applications, 84:103809. DOI: https://doi.org/10.1016/j.jisa.2024.103809.

Kanta, A., Coray, S., Coisel, I., and Scanlon, M. (2021). How viable is password cracking in digital forensic investigation? analyzing the guessability of over 3.9 billion real-world accounts. Forensic Science International: Digital Investigation, 37:301186. DOI: 10.1016/j.fsidi.2021.301186.

Keküllüoğlu, D., Magdy, W., and Vaniea, K. (2022). From an authentication question to a public social event: Characterizing birthday sharing on twitter. In Proceedings of the International AAAI Conference on Web and Social Media, volume 16, pages 488-499. DOI: 10.1609/icwsm.v16i1.19309.

Kingma, D. P. (2013). Auto-encoding variational bayes. arXiv. DOI: 10.48550/arXiv.1312.6114.

Lee, K., Sjöberg, S., and Narayanan, A. (2022). Password policies of most top websites fail to follow best practices. In Eighteenth Symposium on Usable Privacy and Security (SOUPS 2022), pages 561-580. Available at: [link].

Lykousas, N. and Patsakis, C. (2023). Tales from the git: Automating the detection of secrets on code and assessing developers’ passwords choices. In 2023 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW), pages 68-75. IEEE. DOI: 10.1109/EuroSPW59978.2023.00013.

Mannuela, I., Putri, J., Anggreainy, M. S., et al. (2021). Level of password vulnerability. In 2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI), volume 1, pages 351-354. IEEE. DOI: 10.1109/ICCSAI53272.2021.9609778.

Marchetti, K. and Bodily, P. (2022). John the ripper: An examination and analysis of the popular hash cracking algorithm. In 2022 Intermountain Engineering, Technology and Computing (IETC), pages 1-6. IEEE. DOI: 10.1109/IETC54973.2022.9796671.

Mayer, P., Zou, Y., Schaub, F., and Aviv, A. J. (2021). " now i'm a bit angry:" individuals' awareness, perception, and responses to data breaches that affected them. In 30th USENIX Security Symposium (USENIX Security 21), pages 393-410. Available online [link].

Miessler, D. (2020). Seclists. Available online [link].

Nisenoff, A., Golla, M., Wei, M., Hainline, J., Szymanek, H., Braun, A., Hildebrandt, A., Christensen, B., Langenberg, D., and Ur, B. (2023). A Two-Decade retrospective analysis of a university's vulnerability to attacks exploiting reused passwords. In 32nd USENIX Security Symposium (USENIX Security 23), pages 5127-5144. Available online [link].

Nugroho, A. and Mantoro, T. (2023). Salt hash password using md5 combination for dictionary attack protection. In 2023 6th International Conference of Computer and Informatics Engineering (IC2IE), pages 292-296. IEEE. DOI: 10.1109/IC2IE60547.2023.10331606.

Pal, B., Islam, M., Sanusi, M., Sullivan, N., Valenta, L., Whalen, T., Wood, C., Ristenpart, T., and Chatterjee, R. (2022). Might i get pwned: A second generation password breach alerting service. In USENIX Security. Available online [link].

Parmar, V., Sanghvi, H. A., Patel, R. H., and Pandya, A. S. (2022). A comprehensive study on passwordless authentication. In 2022 International Conference on Sustainable Computing and Data Communication Systems (ICSCDS), pages 1266-1275. IEEE. DOI: 10.1109/ICSCDS53736.2022.9760934.

Petkauskas, V. (2024). Rockyou2024: 10 billion passwords leaked in the largest compilation of all time | cybernews. Available online [link].

Pimenta Rodrigues, G. A., de Oliveira Albuquerque, R., Gomes de Deus, F. E., de Sousa Jr, R. T., de Oliveira Júnior, G. A., Garcia Villalba, L. J., and Kim, T.-H. (2017). Cybersecurity and network forensics: Analysis of malicious traffic towards a honeynet with deep packet inspection. Applied Sciences, 7(10):1082. DOI: 10.3390/app7101082.

Pimenta Rodrigues, G. A., Marques Serrano, A. L., Lopes Espiñeira Lemos, A. N., Canedo, E. D., Mendonça, F. L. L. d., de Oliveira Albuquerque, R., Sandoval Orozco, A. L., and García Villalba, L. J. (2024). Understanding data breach from a global perspective: Incident visualization and data protection law review. Data, 9(2):27. DOI: 10.3390/data9020027.

Reaz, K. and Wunder, G. (2022). Expectation entropy as a password strength metric. In 2022 IEEE Conference on Communications and Network Security (CNS), pages 1-2. IEEE. DOI: 10.1109/CNS56114.2022.9947259.

Rehak, D., Slivkova, S., Janeckova, H., Stuberova, D., and Hromada, M. (2022). Strengthening resilience in the energy critical infrastructure: methodological overview. Energies, 15(14):5276. DOI: 10.3390/en15145276.

Remy, P. (2021). Name dataset. Available at: [link].

Rodrigues, G. A. P., Serrano, A. L. M., Vergara, G. F., Albuquerque, R. d. O., and Nze, G. D. A. (2024). Impact, compliance, and countermeasures in relation to data breaches in publicly traded us companies. Future Internet, 16(6):201. DOI: 10.3390/fi16060201.

Romano, J., Kromrey, J. D., Coraggio, J., and Skowronek, J. (2006). Appropriate statistics for ordinal level data: Should we really be using t-test and cohen’sd for evaluating group differences on the nsse and other surveys. In annual meeting of the Florida Association of Institutional Research, volume 177.

Siponen, M., Puhakainen, P., and Vance, A. (2020). Can individuals’ neutralization techniques be overcome? a field experiment on password policy. Computers & Security, 88:101617. DOI: 10.1016/j.cose.2019.101617.

Styoutomo, Y. A. and Ruldeviyani, Y. (2023). Information security awareness raising strategy using fuzzy ahp method with hais-q and iso/iec 27001: 2013: A case study of xyz financial institution. CommIT (Communication and Information Technology) Journal, 17(2):133-149. DOI: 10.21512/commit.v17i2.8272.

Tatli, E. I. (2015). Cracking more password hashes with patterns. IEEE Transactions on Information Forensics and Security, 10(8):1656-1665. DOI: 10.1109/TIFS.2015.2422259.

Thai, B. L. T. and Tanaka, H. (2024). A statistical markov-based password strength meter. Internet of Things, 25:101057. DOI: 10.1016/j.iot.2023.101057.

Upadhyay, D. and Sampalli, S. (2020). Scada (supervisory control and data acquisition) systems: Vulnerability assessment and security recommendations. Computers and Security, 89:101666. DOI: 10.1016/j.cose.2019.101666.

van den Berg, J. (2024). Present-day cybersecurity: Actual challenges and solution directions. In Rath, D. M. and Samal, D. T., editors, Key Issues in Network Protocols and Security, chapter 0. IntechOpen, Rijeka. DOI: 10.5772/intechopen.1007021.

Walia, K. S., Shenoy, S., and Cheng, Y. (2020). An empirical analysis on the usability and security of passwords. In 2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science (IRI), pages 1-8. IEEE. DOI: 10.1109/IRI49571.2020.00009.

Wan, Z., Xia, X., Lo, D., and Murphy, G. C. (2019). How does machine learning change software development practices? IEEE Transactions on Software Engineering, 47(9):1857-1871. DOI: 10.1109/TSE.2019.2937083.

Wang, C., Jan, S. T., Hu, H., Bossart, D., and Wang, G. (2018). The next domino to fall: Empirical analysis of user passwords across online services. In Proceedings of the Eighth ACM Conference on Data and Application Security and Privacy, pages 196-203. DOI: 10.1145/3176258.3176332.

Wash, R. and Rader, E. (2021). Prioritizing security over usability: Strategies for how people choose passwords. Journal of Cybersecurity, 7(1):tyab012. DOI: 10.1093/cybsec/tyab012.

Weir, M., Aggarwal, S., Collins, M., and Stern, H. (2010). Testing metrics for password creation policies by attacking large sets of revealed passwords. In Proceedings of the 17th ACM conference on Computer and communications security, pages 162-175. DOI: 10.1145/1866307.186632.

Wheeler, D. L. (2016). zxcvbn:Low-Budget password strength estimation. In 25th USENIX Security Symposium (USENIX Security 16), pages 157-173. Available at:[link].

Wohlin, C., Runeson, P., Höst, M., Ohlsson, M. C., Regnell, B., Wesslén, A., et al. (2012). Experimentation in software engineering, volume 236. Springer. DOI: 10.1007/978-3-662-69306-3.

Wu, Y., Wan, X., Guan, X., Ji, T., and Ye, F. (2023). Pgtcn: A novel password-guessing model based on temporal convolution network. Journal of Network and Computer Applications. DOI: 10.1016/j.jnca.2023.103592.

Xiao, Y. (2024). Passrvae: Improved trawling attacks via recurrent variational autoencoder. In In Proceedings of the 2024 3rd International Conference on Cryptography, Network Security and Communication Technology. DOI: 10.1145/3673277.3673295.

Xu, M., Yu, J., Zhang, X., Wang, C., Zhang, S., Wu, H., and Han, W. (2023). Improving real-world password guessing attacks via bi-directional transformers. In 32nd USENIX Security Symposium (USENIX Security 23), pages 1001-1018.

Yang, K., Hu, X., Zhang, Q., Wei, J., and Liu, W. (2022). Vaepass: A lightweight passwords guessing model based on variational auto-encoder. Computers and Security. DOI: 10.1016/j.cose.2021.102587.

Yang, T. and Wang, D. (2024). Rankguess: Password guessing using adversarial ranking. In 2025 IEEE Symposium on Security and Privacy (SP), pages 40-40. IEEE Computer Society. DOI: 10.1109/SP61157.2025.00040.

Yu, W., Yin, Q., Yin, H., Xiao, W., Chang, T., and He, L. (2023). A systematic review on password guessing tasks. Entropy. DOI: 10.3390/e25091303.

Zhang, H., Wang, C., Ruan, W., Zhang, J., Xu, M., and Han, W. (2021). Digit semantics based optimization for practical password cracking tools. In Proceedings of the 37th Annual Computer Security Applications Conference, pages 513-527. DOI: 10.1145/3485832.3488025.

СЛАТВІНСЬКА and БЕВЗА (2024). ВПЛИВ ЗБОЮ crowdstrike НА МЕГА-ВИТІК ПАРОЛІВ: ЧИ Є ЗВ'ЯЗОК? Ч. 1. Herald of Khmelnytskyi National University. Technical sciences, 339(4):332-338. DOI: 10.31891/2307-5732-2024-339-4-52.

From RockYou to RockYou2024: Analyzing Password Patterns Across Generations, Their Use in Industrial Systems and Vulnerability to Password Guessing Attacks

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Metrics:

Make a Submission