skip to main content
survey
Free Access
Just Accepted

Identifying Authorship in Malicious Binaries: Features, Challenges & Datasets

Online AM:26 March 2024Publication History
Skip Abstract Section

Abstract

Attributing a piece of malware to its creator typically requires threat intelligence. Binary attribution increases the level of difficulty as it mostly relies upon the ability to disassemble binaries to obtain authorship-related features. We perform a systematic analysis of works in the area of malware authorship attribution. We identify key findings, some shortcomings of current approaches and explore the open research challenges. To mitigate the lack of ground truth datasets in this domain, we publish alongside this survey the largest and most diverse meta-information dataset of 17,513 malware labeled to 275 threat actor groups.

References

  1. S. Afroz, A. C. Islam, A. Stolerman, R. Greenstadt, and D. McCoy. 2014. Doppelgänger Finder: Taking Stylometry to the Underground. In 2014 IEEE Symposium on Security and Privacy. 212–226. https://doi.org/10.1109/SP.2014.21Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Mohammadhadi Alaeiyan, Ali Dehghantanha, Tooska Dargahi, Mauro Conti, and Saeed Parsa. 2020. A Multilabel Fuzzy Relevance Clustering System for Malware Attack Attribution in the Edge Layer of Cyber-Physical Networks. ACM Trans. Cyber-Phys. Syst. 4, 3, Article 31(mar 2020), 22 pages. https://doi.org/10.1145/3351881Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. AlienVault. [n. d.]. https://otx.alienvault.com/Google ScholarGoogle Scholar
  4. Saed Alrabaee, Mourad Debbabi, and Lingyu Wang. 2019. On the feasibility of binary authorship characterization. Digital Investigation 28, Supplement (2019), S3–S11. https://doi.org/10.1016/j.diin.2019.01.028Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Saed Alrabaee, Mourad Debbabi, and Lingyu Wang. 2022. A Survey of Binary Code Fingerprinting Approaches: Taxonomy, Methodologies, and Features. ACM Comput. Surv. 55, 1, Article 19 (jan 2022), 41 pages. https://doi.org/10.1145/3486860Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Saed Alrabaee, ElMouatez Billah Karbab, Lingyu Wang, and Mourad Debbabi. 2019. BinEye: Towards Efficient Binary Authorship Characterization Using Deep Learning. In Computer Security - ESORICS 2019 - 24th European Symposium on Research in Computer Security, Luxembourg, September 23-27, 2019, Proceedings, Part II. 47–67. https://doi.org/10.1007/978-3-030-29962-0_3Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Saed Alrabaee, Noman Saleem, Stere Preda, Lingyu Wang, and Mourad Debbabi. 2014. OBA2: An Onion approach to Binary code Authorship Attribution. Digital Investigation 11(2014), S94 – S103. https://doi.org/10.1016/j.diin.2014.03.012 Proceedings of the First Annual DFRWS Europe.Google ScholarGoogle ScholarCross RefCross Ref
  8. Saed Alrabaee, Paria Shirani, Mourad Debbabi, and Lingyu Wang. 2017. On the Feasibility of Malware Authorship Attribution. In Foundations and Practice of Security, Frédéric Cuppens, Lingyu Wang, Nora Cuppens-Boulahia, Nadia Tawbi, and Joaquin Garcia-Alfaro (Eds.). Springer International Publishing, Cham, 256–272.Google ScholarGoogle Scholar
  9. Saed Alrabaee, Paria Shirani, Lingyu Wang, and Mourad Debbabi. 2018. FOSSIL: A Resilient and Efficient System for Identifying FOSS Functions in Malware Binaries. ACM Trans. Priv. Secur. 21, 2, Article 8(Jan. 2018), 34 pages. https://doi.org/10.1145/3175492Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Saed Alrabaee, Paria Shirani, Lingyu Wang, Mourad Debbabi, and Aiman Hanna. 2018. On Leveraging Coding Habits for Effective Binary Authorship Attribution. In Computer Security, Javier Lopez, Jianying Zhou, and Miguel Soriano (Eds.). Springer International Publishing, Cham, 26–47.Google ScholarGoogle Scholar
  11. Victor M. Alvarez. 2020. YARA. https://virustotal.github.io/yara/ Retrieved May 30, 2020 fromGoogle ScholarGoogle Scholar
  12. Naqqash Aman, Yasir Saleem, Fahim H. Abbasi, and Farrukh Shahzad. 2017. A Hybrid Approach for Malware Family Classification. In Applications and Techniques in Information Security, Lynn Batten, Dong Seong Kim, Xuyun Zhang, and Gang Li (Eds.). Springer Singapore, Singapore, 169–180.Google ScholarGoogle Scholar
  13. armbues. 2015. ioc_parser. https://github.com/armbues/ioc_parser Retrieved Oct 16, 2020 fromGoogle ScholarGoogle Scholar
  14. Vitor Ventura Asheer Malhotra and Jungsoo An. 2022. Lazarus and the tale of three RATs. https://blog.talosintelligence.com/lazarus-three-rats/ Retrieved Feb 1, 2023 fromGoogle ScholarGoogle Scholar
  15. AT&T Cybersecurity. 2018. OTX Trends 2018 Q1 and Q2. https://cybersecurity.att.com/resource-center/white-papers/2018-open-threat-exchange-trends Retrieved May 21, 2020 fromGoogle ScholarGoogle Scholar
  16. Brian. Bartholomew and Juan Andres Guerrero-Saade. 2016. WAVE YOUR FALSE FLAGS! DECEPTION TACTICS MUDDYING ATTRIBUTION IN TARGETED ATTACKS. (2016). https://media.kasperskycontenthub.com/wp-content/uploads/sites/43/2017/10/20114955/Bartholomew-GuerreroSaade-VB2016.pdf Retrieved May 24, 2020 fromGoogle ScholarGoogle Scholar
  17. Omri Ben Bassat and Itay Cohen. 2019. Mapping the Connections Inside Russia’s APT Ecosystem. https://www.intezer.com/blog-russian-apt-ecosystem/ Retrieved May 24, 2020 fromGoogle ScholarGoogle Scholar
  18. Boldizsár Bencsáth, Gábor Pék, Levente Buttyán, and Márk Félegyházi. 2012. The Cousins of Stuxnet: Duqu, Flame, and Gauss. Future Internet 4, 4 (2012), 971–1003. https://doi.org/10.3390/fi4040971Google ScholarGoogle ScholarCross RefCross Ref
  19. Marius Benthin. 2022. Attribution of Malware Binaries to APT Actors using an Ensemble Classifier. Master’s thesis.Google ScholarGoogle Scholar
  20. Edward Loper Bird, Steven and Ewan Klein. 2009. Natural Language Processing with Python.Google ScholarGoogle Scholar
  21. Bishop Fox. 2019. cyber.dic. https://github.com/BishopFox/cyberdic Retrieved Oct 16, 2020 fromGoogle ScholarGoogle Scholar
  22. Coen Boot. 2019. Applying Supervised Learning on Malware Authorship Attribution. Master’s thesis.Google ScholarGoogle Scholar
  23. Xander Bouwman, Harm Griffioen, Jelle Egbers, Christian Doerr, Bram Klievink, and Michel van Eeten. 2020. A different cup of TI? The added value of commercial threat intelligence. In 29th USENIX Security Symposium (USENIX Security 20). USENIX Association, 433–450. https://www.usenix.org/conference/usenixsecurity20/presentation/bouwmanGoogle ScholarGoogle Scholar
  24. Michael Brennan, Sadia Afroz, and Rachel Greenstadt. 2012. Adversarial Stylometry: Circumventing Authorship Recognition to Preserve Privacy and Anonymity. ACM Trans. Inf. Syst. Secur. 15, 3, Article 12(nov 2012), 22 pages. https://doi.org/10.1145/2382448.2382450Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Steven Burrows, Alexandra L Uitdenbogerd, and Andrew Turpin. 2014. Comparing techniques for authorship attribution of source code. Softw., Pract. Exper. 44, 1 (2014), 1–32. https://doi.org/10.1002/spe.2146Google ScholarGoogle ScholarCross RefCross Ref
  26. Aylin Caliskan, Fabian Yamaguchi, Edwin Dauber, Richard E. Harang, Konrad Rieck, Rachel Greenstadt, and Arvind Narayanan. 2018. When Coding Style Survives Compilation: De-anonymizing Programmers from Executable Binaries. In 25th Annual Network and Distributed System Security Symposium, NDSS 2018, San Diego, California, USA, February 18-21, 2018. https://faculty.washington.edu/aylin/papers/caliskan_when.pdfGoogle ScholarGoogle ScholarCross RefCross Ref
  27. Aylin Caliskan-Islam, Richard Harang, Andrew Liu, Arvind Narayanan, Clare Voss, Fabian Yamaguchi, and Rachel Greenstadt. 2015. De-anonymizing Programmers via Code Stylometry. In 24th USENIX Security Symposium (USENIX Security 15). USENIX Association, Washington, D.C., 255–270. https://www.usenix.org/conference/usenixsecurity15/technical-sessions/presentation/caliskan-islamGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  28. Alejandro Calleja, Juan Tapiador, and Juan Caballero. 2016. A Look into 30 Years of Malware Development from a Software Metrics Perspective. In Research in Attacks, Intrusions, and Defenses, Fabian Monrose, Marc Dacier, Gregory Blanc, and Joaquin Garcia-Alfaro (Eds.). Springer International Publishing, Cham, 325–345.Google ScholarGoogle Scholar
  29. A. Calleja, J. Tapiador, and J. Caballero. 2019. The MalSource Dataset: Quantifying Complexity and Code Reuse in Malware Development. IEEE Transactions on Information Forensics and Security 14, 12(Dec 2019), 3175–3190. https://doi.org/10.1109/TIFS.2018.2885512Google ScholarGoogle ScholarCross RefCross Ref
  30. N. Carlini and D. Wagner. 2017. Towards Evaluating the Robustness of Neural Networks. In 2017 IEEE Symposium on Security and Privacy (SP). 39–57. https://doi.org/10.1109/SP.2017.49Google ScholarGoogle ScholarCross RefCross Ref
  31. Centro Criptológico Nacional (CCN-CERT). 2020. Ciberamenazas Y Tendencias. https://www.ccn-cert.cni.es/informes/informes-ccn-cert-publicos/5377-ccn-cert-ia-13-20-ciberamenazas-y-tendencias-edicion-2020/file.html Retrieved Nov 16, 2020 fromGoogle ScholarGoogle Scholar
  32. Chronicle. 2004. VirusTotal. www.virustotal.com Retrieved Oct 16, 2020 fromGoogle ScholarGoogle Scholar
  33. Itay Cohen and Eyal Itkin. 2020. GRAPHOLOGY OF AN EXPLOIT – HUNTING FOR EXPLOITS BY LOOKING FOR THE AUTHOR’S FINGERPRINTS. (2020). https://vblocalhost.com/uploads/VB2020-Cohen-Itkin.pdfGoogle ScholarGoogle Scholar
  34. Stephen A. Cook. 1971. The Complexity of Theorem-Proving Procedures. In Proceedings of the Third Annual ACM Symposium on Theory of Computing (Shaker Heights, Ohio, USA) (STOC ’71). Association for Computing Machinery, New York, NY, USA, 151–158. https://doi.org/10.1145/800157.805047Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Council on Foreign Relations. 2020. Cyber Operations Tracker. https://www.cfr.org/interactive/cyber-operations Retrieved Oct 27, 2020 fromGoogle ScholarGoogle Scholar
  36. cyber-research. 2019. APTMalware. https://github.com/cyber-research/APTMalware Retrieved Sep 25, 2020 fromGoogle ScholarGoogle Scholar
  37. Edwin Dauber, Aylin Caliskan, Richard E. Harang, Gregory Shearer, Michael Weisman, Frederica Nelson, and Rachel Greenstadt. 2019. Git Blame Who?: Stylistic Authorship Attribution of Small, Incomplete Source Code Fragments. PoPETs 2019, 3 (2019), 389–408. https://doi.org/10.2478/popets-2019-0053Google ScholarGoogle ScholarCross RefCross Ref
  38. M. V. Emmerik and T. Waddington. 2004. Using a decompiler for real-world source recovery. In 11th Working Conference on Reverse Engineering. 27–36. https://doi.org/10.1109/WCRE.2004.42Google ScholarGoogle ScholarCross RefCross Ref
  39. Mohammad Reza Farhadi, Benjamin C.M. Fung, Yin Bun Fung, Philippe Charland, Stere Preda, and Mourad Debbabi. 2015. Scalable code clone search for malware analysis. Digital Investigation 15(2015), 46 – 60. https://doi.org/10.1016/j.diin.2015.06.001 Special Issue: Big Data and Intelligent Data Analysis.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. FireEye. 2017. FLOSS. https://github.com/fireeye/flare-floss Retrieved May 24, 2020 fromGoogle ScholarGoogle Scholar
  41. Georgia Frantzeskou, Efstathios Stamatatos, Stefanos Gritzalis, Carole E. Chaski, and Blake Stephen Howald. 2007. Identifying Authorship by Byte-Level N-Grams: The Source Code Author Profile (SCAP) Method. IJDE 6, 1 (2007). http://www.utica.edu/academic/institutes/ecii/publications/articles/B41158D1-C829-0387-009D214D2170C321.pdfGoogle ScholarGoogle Scholar
  42. Noah Gamer. 2016. The problem with open source malware. https://blog.trendmicro.com/the-problem-with-open-source-malware/ Retrieved May 29, 2020 fromGoogle ScholarGoogle Scholar
  43. GitHub. 2020. GitHub Repositories. https://github.com Retrieved May 24, 2020 fromGoogle ScholarGoogle Scholar
  44. Siyi Gong and Hao Zhong. 2021. Code Authors Hidden in File Revision Histories: An Empirical Study. In 2021 IEEE/ACM 29th International Conference on Program Comprehension (ICPC). 71–82. https://doi.org/10.1109/ICPC52881.2021.00016Google ScholarGoogle ScholarCross RefCross Ref
  45. Hugo Gonzalez, Natalia Stakhanova, and Ali A. Ghorbani. 2018. Authorship Attribution of Android Apps. In Proceedings of the Eighth ACM Conference on Data and Application Security and Privacy (CODASPY ’18). ACM, 277–286. https://doi.org/10.1145/3176258.3176322Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Google. 2008-2020. Google Code Jam. https://codingcompetitions.withgoogle.com/codejam/ Retrieved May 24, 2020 fromGoogle ScholarGoogle Scholar
  47. Google Scholar. [n. d.]. https://scholar.google.comGoogle ScholarGoogle Scholar
  48. H. Haddadpajouh, A. Azmoodeh, A. Dehghantanha, and R. M. Parizi. 2020. MVFCC: A Multi-View Fuzzy Consensus Clustering Model for Malware Threat Attribution. IEEE Access 8(2020), 139188–139198.Google ScholarGoogle ScholarCross RefCross Ref
  49. Karsten Hahn. 2021. Malware family naming hell is our own fault. https://www.gdatasoftware.com/blog/malware-family-naming-hell Retrieved Jan 31, 2023 fromGoogle ScholarGoogle Scholar
  50. Weijie Han, Jingfeng Xue, Yong Wang, Fuquan Zhang, and Xianwei Gao. 2021. APTMalInsight: Identify and cognize APT malware based on system call information and ontology knowledge framework. Information Sciences 546(2021), 633–664. https://doi.org/10.1016/j.ins.2020.08.095Google ScholarGoogle ScholarCross RefCross Ref
  51. Irfan Ul Haq and Juan Caballero. 2019. A Survey of Binary Code Similarity. CoRR abs/1909.11424(2019). arxiv:1909.11424 http://arxiv.org/abs/1909.11424Google ScholarGoogle Scholar
  52. Steven Hendrikse. 2017. The Effect of Code Obfuscation on Authorship Attribution of Binary Computer Files. Ph. D. Dissertation. https://nsuworks.nova.edu/gscis_etd/1009Google ScholarGoogle Scholar
  53. Ben Herzog. 2018. The GandCrab Ransomware Mindset. https://research.checkpoint.com/2018/gandcrab-ransomware-mindset/ Retrieved May 24, 2020 fromGoogle ScholarGoogle Scholar
  54. Hex-Rays. [n. d.]. Decompiler. https://hex-rays.com/decompiler/ Retrieved Mar 3, 2023 fromGoogle ScholarGoogle Scholar
  55. Floyd Hightower. 2017. Observable Finder. https://github.com/fhightower/ioc-finder Retrieved Oct 16, 2020 fromGoogle ScholarGoogle Scholar
  56. Jiwon Hong, Sanghyun Park, Sang-Wook Kim, Dongphil Kim, and Wonho Kim. 2018. Classifying malwares for identification of author groups. Concurrency and Computation: Practice and Experience 30, 3(2018), e4197. https://doi.org/10.1002/cpe.4197 e4197 cpe.4197.Google ScholarGoogle ScholarCross RefCross Ref
  57. Jiwon Hong, Sung-Jun Park, Taeri Kim, Yung-Kyun Noh, Sang-Wook Kim, Dongphil Kim, and Wonho Kim. 2019. Malware Classification for Identifying Author Groups: A Graph-Based Approach. In Proceedings of the Conference on Research in Adaptive and Convergent Systems (Chongqing, China) (RACS ’19). Association for Computing Machinery, New York, NY, USA, 169–174. https://doi.org/10.1145/3338840.3355684Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Matthew Honnibal and Ines Montani. 2017. spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing. To appear (2017).Google ScholarGoogle Scholar
  59. M. Hurier, G. Suarez-Tangil, S. K. Dash, T. F. Bissyandé, Y. Le Traon, J. Klein, and L. Cavallaro. 2017. Euphony: Harmonious Unification of Cacophonous Anti-Virus Vendor Labels for Android Malware. In 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR). 425–435. https://doi.org/10.1109/MSR.2017.57Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Vaibhavi Kalgutkar, Ratinder Kaur, Hugo Gonzalez, Natalia Stakhanova, and Alina Matyukhina. 2019. Code Authorship Attribution: Methods and Challenges. ACM Comput. Surv. 52, 1, Article 3 (Feb. 2019), 36 pages. https://doi.org/10.1145/3292577Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Vaibhavi Kalgutkar, Natalia Stakhanova, Paul Cook, and Alina Matyukhina. 2018. Android authorship attribution through string analysis. In Proceedings of the 13th International Conference on Availability, Reliability and Security, ARES 2018, Hamburg, Germany, August 27-30, 2018, Sebastian Doerr, Mathias Fischer, Sebastian Schrittwieser, and Dominik Herrmann (Eds.). ACM, 4:1–4:10. https://doi.org/10.1145/3230833.3230849Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Kaspersky. 2020. The power of threat attribution. https://media.kaspersky.com/en/business-security/enterprise/threat-attribution-engine-whitepaper.pdf Retrieved Oct 02, 2020 fromGoogle ScholarGoogle Scholar
  63. Eujeanne Kim, Sung-Jun Park, Seokwoo Choi, Dong-Kyu Chae, and Sang-Wook Kim. 2021. MANIAC: A Man-Machine Collaborative System for Classifying Malware Author Groups. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security(Virtual Event, Republic of Korea) (CCS ’21). Association for Computing Machinery, New York, NY, USA, 2441–2443. https://doi.org/10.1145/3460120.3485355Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. B. Kolosnjaji, A. Demontis, B. Biggio, D. Maiorca, G. Giacinto, C. Eckert, and F. Roli. 2018. Adversarial Malware Binaries: Evading Deep Learning for Malware Detection in Executables. In 2018 26th European Signal Processing Conference (EUSIPCO). 533–537. https://doi.org/10.23919/EUSIPCO.2018.8553214Google ScholarGoogle ScholarCross RefCross Ref
  65. Ivan Krsul and Eugene H. Spafford. 1997. Authorship analysis: identifying the author of a program. Comput. Secur. 16, 3 (1997), 233–257. https://doi.org/10.1016/S0167-4048(97)00005-9Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Giuseppe Laurenza and Riccardo Lazzeretti. 2020. dAPTaset: A Comprehensive Mapping of APT-Related Data. In Computer Security, Apostolos P. Fournaris, Manos Athanatos, Konstantinos Lampropoulos, Sotiris Ioannidis, George Hatzivasilis, Ernesto Damiani, Habtamu Abie, Silvio Ranise, Luca Verderame, Alberto Siena, and Joaquin Garcia-Alfaro (Eds.). Springer International Publishing, Cham, 217–225.Google ScholarGoogle Scholar
  67. Giuseppe Laurenza, Riccardo Lazzeretti, and Luca Mazzotti. 2020. Malware Triage for Early Identification of Advanced Persistent Threat Activities. Digital Threats 1, 3, Article 16 (aug 2020), 17 pages. https://doi.org/10.1145/3386581Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Valentine Legoy, Marco Caselli, Christin Seifert, and Andreas Peter. 2020. Automated Retrieval of ATT&CK Tactics and Techniques for Cyber Threat Reports. arxiv:2004.14322  [cs.CR]Google ScholarGoogle Scholar
  69. Antoine Lemay, Joan Calvet, François Menet, and José M. Fernandez. 2018. Survey of publicly available reports on advanced persistent threat actors. Computers and Security 72 (2018), 26 – 59. https://doi.org/10.1016/j.cose.2017.08.005Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Lockheed-Martin. 2015. Gaining The Advantage Applying Cyber Kill Chain® Methodology to Network Defense. https://www.lockheedmartin.com/content/dam/lockheed-martin/rms/documents/cyber/Gaining_the_Advantage_Cyber_Kill_Chain.pdf Retrieved May 24, 2020 fromGoogle ScholarGoogle Scholar
  71. Andrea Marcelli, Mariano Graziano, Xabier Ugarte-Pedrero, Yanick Fratantonio, Mohamad Mansouri, and Davide Balzarotti. 2022. How Machine Learning Is Solving the Binary Function Similarity Problem. In 31st USENIX Security Symposium (USENIX Security 2022). USENIX Association.Google ScholarGoogle Scholar
  72. Morgan Marquis-Boire, Marion Marschalek, and Claudio Guarnieri. 2015. BIG GAME HUNTING: THE PECULIARITIES IN NATION­STATE MALWARE RESEARCH. (2015). https://www.blackhat.com/docs/us-15/materials/us-15-MarquisBoire-Big-Game-Hunting-The-Peculiarities-Of-Nation-State-Malware-Research.pdfGoogle ScholarGoogle Scholar
  73. Masrepus, vfsrfs, and garanews. 2019. Un{i}packer. https://github.com/unipacker/unipacker Retrieved May 24, 2020 fromGoogle ScholarGoogle Scholar
  74. Alina Matyukhina, Natalia Stakhanova, Mila Dalla Preda, and Celine Perley. 2019. Adversarial Authorship Attribution in Open-Source Projects. In Proceedings of the Ninth ACM Conference on Data and Application Security and Privacy (Richardson, Texas, USA) (CODASPY ’19). ACM, New York, NY, USA, 291–302. https://doi.org/10.1145/3292006.3300032Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Xiaozhu Meng. 2016. Fine-grained Binary Code Authorship Identification. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2016). ACM, 1097–1099. https://doi.org/10.1145/2950290.2983962Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Xiaozhu Meng and Barton P. Miller. 2018. Binary Code Multi-Author Identification in Multi-Toolchain Scenarios. Under Submission (2018). http://ftp.cs.wisc.edu/paradyn/papers/Meng17MultiToolchain.pdfGoogle ScholarGoogle Scholar
  77. Xiaozhu Meng, Barton P. Miller, and Somesh Jha. 2018. Adversarial Binaries for Authorship Identification. CoRR abs/1809.08316(2018). arxiv:1809.08316 http://arxiv.org/abs/1809.08316Google ScholarGoogle Scholar
  78. Xiaozhu Meng, Barton P. Miller, and Kwang-Sung Jun. 2017. Identifying Multiple Authors in a Binary Program. In Computer Security – ESORICS 2017, Simon N. Foley, Dieter Gollmann, and Einar Snekkenes (Eds.). Springer International Publishing, Cham, 286–304.Google ScholarGoogle ScholarCross RefCross Ref
  79. Xiaozhu Meng, B. P. Miller, W. R. Williams, and A. R. Bernat. 2013. Mining Software Repositories for Accurate Authorship. In 2013 IEEE International Conference on Software Maintenance (ICSM). IEEE Computer Society, Los Alamitos, CA, USA, 250–259. https://doi.org/10.1109/ICSM.2013.36Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Najmeh Miramirkhani, Mahathi Priya Appini, Nick Nikiforakis, and Michalis Polychronakis. 2017. Spotless Sandboxes: Evading Malware Analysis Systems Using Wear-and-Tear Artifacts. In 2017 IEEE Symposium on Security and Privacy (SP). 1009–1024. https://doi.org/10.1109/SP.2017.42Google ScholarGoogle ScholarCross RefCross Ref
  81. MISP: Open Source Threat Intelligence Platform. 2020. List of Threat Actors. https://raw.githubusercontent.com/MISP/misp-galaxy/main/clusters/threat-actor.json Retrieved Oct 27, 2020 fromGoogle ScholarGoogle Scholar
  82. Mitre. 2020. ATT&CK. https://attack.mitre.org/ Retrieved May 22, 2020 fromGoogle ScholarGoogle Scholar
  83. Tempestt J. Neal, Kalaivani Sundararajan, Aneez Fatima, Yiming Yan, Yingfei Xiang, and Damon L. Woodard. 2018. Surveying Stylometry Techniques and Applications. ACM Comput. Surv. 50, 6 (2018), 86:1–86:36. https://doi.org/10.1145/3132039Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. OASIS Cyber Threat Intelligence. 2020. STIX/TAXII 2.0. https://oasis-open.github.io/cti-documentation/ Retrieved May 24, 2020 fromGoogle ScholarGoogle Scholar
  85. Office of the Director of National Intelligence. 2018. A Guide to Cyber Attribution. https://www.dni.gov/files/CTIIC/documents/ODNI_A_Guide_to_Cyber_Attribution.pdf Retrieved Sep 25, 2020 fromGoogle ScholarGoogle Scholar
  86. P. W. Oman and C. R. Cook. 1989. Programming Style Authorship Analysis. In Proceedings of the 17th Conference on ACM Annual Computer Science Conference (Louisville, Kentucky) (CSC ’89). Association for Computing Machinery, New York, NY, USA, 320–326. https://doi.org/10.1145/75427.75469Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. Luca Pascarella, Fabio Palomba, Massimiliano Di Penta, and Alberto Bacchelli. 2018. How is Video Game Development Different from Software Development in Open Source?. In Proceedings of the 15th International Conference on Mining Software Repositories (Gothenburg, Sweden) (MSR ’18). Association for Computing Machinery, New York, NY, USA, 392–402. https://doi.org/10.1145/3196398.3196418Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. Kexin Pei, Zhou Xuan, Junfeng Yang, Suman Jana, and Baishakhi Ray. 2020. Trex: Learning execution semantics from micro-traces for binary similarity. arXiv preprint arXiv:2012.08680(2020).Google ScholarGoogle Scholar
  89. Daniel Plohmann, Martin Clauss, Steffen Enders, and Elmar Padilla. 2018. Malpedia: A Collaborative Effort to Inventorize the Malware Landscape. The Journal on Cybercrime & Digital Investigations 3, 1(2018). https://journal.cecyf.fr/ojs/index.php/cybin/article/view/17Google ScholarGoogle Scholar
  90. Erwin Quiring, Alwin Maier, and Konrad Rieck. 2019. Misleading Authorship Attribution of Source Code using Adversarial Learning. In 28th USENIX Security Symposium (USENIX Security 19). USENIX Association, Santa Clara, CA, 479–496. https://www.usenix.org/conference/usenixsecurity19/presentation/quiringGoogle ScholarGoogle Scholar
  91. Edward Raff, Richard Zak, Gary Lopez Munoz, William Fleming, Hyrum S. Anderson, Bobby Filar, Charles Nicholas, and James Holt. 2020. Automatic Yara Rule Generation Using Biclustering. In 13th ACM Workshop on Artificial Intelligence and Security (AISec’20). https://doi.org/10.1145/3411508.3421372Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. Radim Řehůřek and Petr Sojka. 2010. Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. ELRA, Valletta, Malta, 45–50. http://is.muni.cz/publication/884893/en.Google ScholarGoogle Scholar
  93. Rewterz. 2023. Annual Threat Intelligence Report 2022. https://www.rewterz.com/wp-content/uploads/2023/01/Annual-Threat-Intelligence-Report-2022.pdf Retrieved Mar 8, 2023 fromGoogle ScholarGoogle Scholar
  94. Thomas Rid and Ben Buchanan. 2015. Attributing Cyber Attacks. Journal of Strategic Studies 38, 1-2 (2015), 4–37. https://doi.org/10.1080/01402390.2014.977382 arXiv:https://doi.org/10.1080/01402390.2014.977382Google ScholarGoogle ScholarCross RefCross Ref
  95. Ed Robbins. 2017. Solvers for Type Recovery and Decompilation of Binaries. Ph. D. Dissertation. University of Kent,. https://kar.kent.ac.uk/61349/Google ScholarGoogle Scholar
  96. Royi Ronen, Marian Radu, Corina Feuerstein, Elad Yom-Tov, and Mansour Ahmadi. 2018. Microsoft Malware Classification Challenge. https://doi.org/10.48550/ARXIV.1802.10135Google ScholarGoogle ScholarCross RefCross Ref
  97. Ishai Rosenberg, Guillaume Sicard, and Eli (Omid) David. 2017. DeepAPT: Nation-State APT Attribution Using End-to-End Deep Neural Networks. In Artificial Neural Networks and Machine Learning – ICANN 2017, Alessandra Lintas, Stefano Rovetta, Paul F.M.J. Verschure, and Alessandro E.P. Villa (Eds.). Springer International Publishing, Cham, 91–99.Google ScholarGoogle ScholarCross RefCross Ref
  98. Ishai Rosenberg, Guillaume Sicard, and Eli (Omid) David. 2018. End-to-End Deep Neural Networks and Transfer Learning for Automatic Analysis of Nation-State Malware. Entropy 20, 5. https://doi.org/10.3390/e20050390Google ScholarGoogle ScholarCross RefCross Ref
  99. Jay Rosenberg and Christiaan Beek. 2018. Examining Code Reuse Reveals Undiscovered Links Among North Korea’s Malware Families. https://www.mcafee.com/blogs/other-blogs/mcafee-labs/examining-code-reuse-reveals-undiscovered-links-among-north-koreas-malware-families/ Retrieved May 24, 2020 fromGoogle ScholarGoogle Scholar
  100. Nathan Rosenblum, Xiaojin Zhu, and Barton P. Miller. 2011. Who Wrote This Code? Identifying the Authors of Program Binaries. In Computer Security – ESORICS 2011, Vijay Atluri and Claudia Diaz (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 172–189.Google ScholarGoogle ScholarCross RefCross Ref
  101. Nathan E. Rosenblum, Barton P. Miller, and Xiaojin Zhu. 2010. Extracting Compiler Provenance from Program Binaries. In Proceedings of the 9th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering (PASTE ’10). ACM, 21–28. https://doi.org/10.1145/1806672.1806678Google ScholarGoogle ScholarDigital LibraryDigital Library
  102. sapphirex00. 2018. APTs and OPs Table Guide. https://github.com/sapphirex00/Threat-Hunting/raw/master/apts_and_ops_tableguide.xlsx Retrieved Oct 27, 2020 fromGoogle ScholarGoogle Scholar
  103. Marcos Sebastián, Richard Rivera, Platon Kotzias, and Juan Caballero. 2016. AVclass: A Tool for Massive Malware Labeling. In Research in Attacks, Intrusions, and Defenses - 19th International Symposium, RAID 2016, Paris, France, September 19-21, 2016, Proceedings. 230–253. https://doi.org/10.1007/978-3-319-45719-2_11Google ScholarGoogle ScholarCross RefCross Ref
  104. Lucy Simko, Luke Zettlemoyer, and Tadayoshi Kohno. 2018. Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution. Proceedings on Privacy Enhancing Technologies 2018, 1(2018), 127 – 144. https://content.sciendo.com/view/journals/popets/2018/1/article-p127.xmlGoogle ScholarGoogle ScholarCross RefCross Ref
  105. Qige Song, Yongzheng Zhang, Linshu Ouyang, and Yige Chen. 2022. BinMLM: Binary Authorship Verification with Flow-aware Mixture-of-Shared Language Model. In 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). 1023–1033. https://doi.org/10.1109/SANER53432.2022.00120Google ScholarGoogle ScholarCross RefCross Ref
  106. Pasquale Stirparo, David Bizeul, Brian Bell, Ziv Chang, Joel Esler, Kristopher Bleich, Maite Moreno, Monnappa K A, J. Capmany, Paul Hutchinson, Boris Ivanov, Andre Gironda, Devon Ackerman, Carlos Fragoso, Eyal Sela, and Florian Egloff. 2015. APT Groups and Operations. https://apt.threattracking.com Retrieved May 24, 2020 fromGoogle ScholarGoogle Scholar
  107. Symantec. 2019. Internet security threat report 2019. https://www.symantec.com/content/dam/symantec/docs/reports/istr-24-2019-en.pdf Retrieved May 24, 2020 fromGoogle ScholarGoogle Scholar
  108. DBLP Team. 2020. DBLP computer science bibliography. https://dblp.uni-trier.de Retrieved May 24, 2020 fromGoogle ScholarGoogle Scholar
  109. Thailand Computer Emergency Response Team. 2020. Threat Group Cards: A Threat Actor Encyclopedia. https://apt.thaicert.or.th/ Retrieved Oct 27, 2020 fromGoogle ScholarGoogle Scholar
  110. Guido van Rossum, Barry Warsaw, and Nick Coghlan. 2001. PEP 8 Style Guide for Python Code. https://www.python.org/dev/peps/pep-0008/ Retrieved May 24, 2020 fromGoogle ScholarGoogle Scholar
  111. VirusShare. [n. d.]. https://virusshare.com/Google ScholarGoogle Scholar
  112. N. Virvilis and D. Gritzalis. 2013. The Big Four - What We Did Wrong in Advanced Persistent Threat Detection?. In 2013 International Conference on Availability, Reliability and Security(ARES), Vol.  00. 248–254. https://doi.org/10.1109/ARES.2013.32Google ScholarGoogle ScholarDigital LibraryDigital Library
  113. Daniel Votipka, Seth M Rabin, Kristopher Micinski, Jeffrey S Foster, and Michelle M Mazurek. 2020. An observational investigation of reverse engineers’ processes. In Proceedings of the 29th USENIX Conference on Security Symposium. 1875–1892.Google ScholarGoogle Scholar
  114. VXUnderground. [n. d.]. https://vx-underground.org/Google ScholarGoogle Scholar
  115. Qinqin Wang, Hanbing Yan, and Zhihui Han. 2021. Explainable APT Attribution for Malware Using NLP Techniques. In 2021 IEEE 21st International Conference on Software Quality, Reliability and Security (QRS). 70–80. https://doi.org/10.1109/QRS54544.2021.00018Google ScholarGoogle ScholarCross RefCross Ref
  116. Claes Wohlin. 2014. Guidelines for snowballing in systematic literature studies and a replication in software engineering. In Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering (London, England, United Kingdom) (EASE ’14). Association for Computing Machinery, New York, NY, USA, Article 38, 10 pages. https://doi.org/10.1145/2601248.2601268Google ScholarGoogle ScholarDigital LibraryDigital Library
  117. H. Xue, S. Sun, G. Venkataramani, and T. Lan. 2019. Machine Learning-Based Analysis of Program Binaries: A Comprehensive Study. IEEE Access 7(2019), 65889–65912.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Identifying Authorship in Malicious Binaries: Features, Challenges & Datasets

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM Computing Surveys
              ACM Computing Surveys Just Accepted
              ISSN:0360-0300
              EISSN:1557-7341
              Table of Contents

              Copyright © 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM.

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Online AM: 26 March 2024
              • Accepted: 7 March 2024
              • Revised: 23 February 2024
              • Received: 18 November 2020

              Check for updates

              Qualifiers

              • survey
            • Article Metrics

              • Downloads (Last 12 months)179
              • Downloads (Last 6 weeks)176

              Other Metrics

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader