survey

Free Access

Just Accepted

Identifying Authorship in Malicious Binaries: Features, Challenges & Datasets

Authors:
Jason Gray

Royal Holloway University of London, Egham, King’s College London, London, The Alan Turing Institute, London, UK

Royal Holloway University of London, Egham, King’s College London, London, The Alan Turing Institute, London, UK
Search about this author

,
Daniele Sgandurra

Royal Holloway University of London, Egham, UK

Royal Holloway University of London, Egham, UK
Search about this author

,
Lorenzo Cavallaro

University College London, London, UK

University College London, London, UK
Search about this author

,
Jorge Blasco

Universidad Politécnica de Madrid, Madrid, Spain

Universidad Politécnica de Madrid, Madrid, Spain
Search about this author

Authors Info & Claims

ACM Computing SurveysAccepted on March 2024https://doi.org/10.1145/3653973

Online AM:26 March 2024Publication History

ACM Computing Surveys

Abstract

Attributing a piece of malware to its creator typically requires threat intelligence. Binary attribution increases the level of difficulty as it mostly relies upon the ability to disassemble binaries to obtain authorship-related features. We perform a systematic analysis of works in the area of malware authorship attribution. We identify key findings, some shortcomings of current approaches and explore the open research challenges. To mitigate the lack of ground truth datasets in this domain, we publish alongside this survey the largest and most diverse meta-information dataset of 17,513 malware labeled to 275 threat actor groups.

References

S. Afroz, A. C. Islam, A. Stolerman, R. Greenstadt, and D. McCoy. 2014. Doppelgänger Finder: Taking Stylometry to the Underground. In 2014 IEEE Symposium on Security and Privacy. 212–226. https://doi.org/10.1109/SP.2014.21Google ScholarDigital Library
Mohammadhadi Alaeiyan, Ali Dehghantanha, Tooska Dargahi, Mauro Conti, and Saeed Parsa. 2020. A Multilabel Fuzzy Relevance Clustering System for Malware Attack Attribution in the Edge Layer of Cyber-Physical Networks. ACM Trans. Cyber-Phys. Syst. 4, 3, Article 31(mar 2020), 22 pages. https://doi.org/10.1145/3351881Google ScholarDigital Library
AlienVault. [n. d.]. https://otx.alienvault.com/Google Scholar
Saed Alrabaee, Mourad Debbabi, and Lingyu Wang. 2019. On the feasibility of binary authorship characterization. Digital Investigation 28, Supplement (2019), S3–S11. https://doi.org/10.1016/j.diin.2019.01.028Google ScholarDigital Library
Saed Alrabaee, Mourad Debbabi, and Lingyu Wang. 2022. A Survey of Binary Code Fingerprinting Approaches: Taxonomy, Methodologies, and Features. ACM Comput. Surv. 55, 1, Article 19 (jan 2022), 41 pages. https://doi.org/10.1145/3486860Google ScholarDigital Library
Saed Alrabaee, ElMouatez Billah Karbab, Lingyu Wang, and Mourad Debbabi. 2019. BinEye: Towards Efficient Binary Authorship Characterization Using Deep Learning. In Computer Security - ESORICS 2019 - 24th European Symposium on Research in Computer Security, Luxembourg, September 23-27, 2019, Proceedings, Part II. 47–67. https://doi.org/10.1007/978-3-030-29962-0_3Google ScholarDigital Library
Saed Alrabaee, Noman Saleem, Stere Preda, Lingyu Wang, and Mourad Debbabi. 2014. OBA2: An Onion approach to Binary code Authorship Attribution. Digital Investigation 11(2014), S94 – S103. https://doi.org/10.1016/j.diin.2014.03.012 Proceedings of the First Annual DFRWS Europe.Google ScholarCross Ref
Saed Alrabaee, Paria Shirani, Mourad Debbabi, and Lingyu Wang. 2017. On the Feasibility of Malware Authorship Attribution. In Foundations and Practice of Security, Frédéric Cuppens, Lingyu Wang, Nora Cuppens-Boulahia, Nadia Tawbi, and Joaquin Garcia-Alfaro (Eds.). Springer International Publishing, Cham, 256–272.Google Scholar
Saed Alrabaee, Paria Shirani, Lingyu Wang, and Mourad Debbabi. 2018. FOSSIL: A Resilient and Efficient System for Identifying FOSS Functions in Malware Binaries. ACM Trans. Priv. Secur. 21, 2, Article 8(Jan. 2018), 34 pages. https://doi.org/10.1145/3175492Google ScholarDigital Library
Saed Alrabaee, Paria Shirani, Lingyu Wang, Mourad Debbabi, and Aiman Hanna. 2018. On Leveraging Coding Habits for Effective Binary Authorship Attribution. In Computer Security, Javier Lopez, Jianying Zhou, and Miguel Soriano (Eds.). Springer International Publishing, Cham, 26–47.Google Scholar
Victor M. Alvarez. 2020. YARA. https://virustotal.github.io/yara/ Retrieved May 30, 2020 fromGoogle Scholar
Naqqash Aman, Yasir Saleem, Fahim H. Abbasi, and Farrukh Shahzad. 2017. A Hybrid Approach for Malware Family Classification. In Applications and Techniques in Information Security, Lynn Batten, Dong Seong Kim, Xuyun Zhang, and Gang Li (Eds.). Springer Singapore, Singapore, 169–180.Google Scholar
armbues. 2015. ioc_parser. https://github.com/armbues/ioc_parser Retrieved Oct 16, 2020 fromGoogle Scholar
Vitor Ventura Asheer Malhotra and Jungsoo An. 2022. Lazarus and the tale of three RATs. https://blog.talosintelligence.com/lazarus-three-rats/ Retrieved Feb 1, 2023 fromGoogle Scholar
AT&T Cybersecurity. 2018. OTX Trends 2018 Q1 and Q2. https://cybersecurity.att.com/resource-center/white-papers/2018-open-threat-exchange-trends Retrieved May 21, 2020 fromGoogle Scholar
Brian. Bartholomew and Juan Andres Guerrero-Saade. 2016. WAVE YOUR FALSE FLAGS! DECEPTION TACTICS MUDDYING ATTRIBUTION IN TARGETED ATTACKS. (2016). https://media.kasperskycontenthub.com/wp-content/uploads/sites/43/2017/10/20114955/Bartholomew-GuerreroSaade-VB2016.pdf Retrieved May 24, 2020 fromGoogle Scholar
Omri Ben Bassat and Itay Cohen. 2019. Mapping the Connections Inside Russia’s APT Ecosystem. https://www.intezer.com/blog-russian-apt-ecosystem/ Retrieved May 24, 2020 fromGoogle Scholar
Boldizsár Bencsáth, Gábor Pék, Levente Buttyán, and Márk Félegyházi. 2012. The Cousins of Stuxnet: Duqu, Flame, and Gauss. Future Internet 4, 4 (2012), 971–1003. https://doi.org/10.3390/fi4040971Google ScholarCross Ref
Marius Benthin. 2022. Attribution of Malware Binaries to APT Actors using an Ensemble Classifier. Master’s thesis.Google Scholar
Edward Loper Bird, Steven and Ewan Klein. 2009. Natural Language Processing with Python.Google Scholar
Bishop Fox. 2019. cyber.dic. https://github.com/BishopFox/cyberdic Retrieved Oct 16, 2020 fromGoogle Scholar
Coen Boot. 2019. Applying Supervised Learning on Malware Authorship Attribution. Master’s thesis.Google Scholar
Xander Bouwman, Harm Griffioen, Jelle Egbers, Christian Doerr, Bram Klievink, and Michel van Eeten. 2020. A different cup of TI? The added value of commercial threat intelligence. In 29th USENIX Security Symposium (USENIX Security 20). USENIX Association, 433–450. https://www.usenix.org/conference/usenixsecurity20/presentation/bouwmanGoogle Scholar
Michael Brennan, Sadia Afroz, and Rachel Greenstadt. 2012. Adversarial Stylometry: Circumventing Authorship Recognition to Preserve Privacy and Anonymity. ACM Trans. Inf. Syst. Secur. 15, 3, Article 12(nov 2012), 22 pages. https://doi.org/10.1145/2382448.2382450Google ScholarDigital Library
Steven Burrows, Alexandra L Uitdenbogerd, and Andrew Turpin. 2014. Comparing techniques for authorship attribution of source code. Softw., Pract. Exper. 44, 1 (2014), 1–32. https://doi.org/10.1002/spe.2146Google ScholarCross Ref
Aylin Caliskan, Fabian Yamaguchi, Edwin Dauber, Richard E. Harang, Konrad Rieck, Rachel Greenstadt, and Arvind Narayanan. 2018. When Coding Style Survives Compilation: De-anonymizing Programmers from Executable Binaries. In 25th Annual Network and Distributed System Security Symposium, NDSS 2018, San Diego, California, USA, February 18-21, 2018. https://faculty.washington.edu/aylin/papers/caliskan_when.pdfGoogle ScholarCross Ref
Aylin Caliskan-Islam, Richard Harang, Andrew Liu, Arvind Narayanan, Clare Voss, Fabian Yamaguchi, and Rachel Greenstadt. 2015. De-anonymizing Programmers via Code Stylometry. In 24th USENIX Security Symposium (USENIX Security 15). USENIX Association, Washington, D.C., 255–270. https://www.usenix.org/conference/usenixsecurity15/technical-sessions/presentation/caliskan-islamGoogle ScholarDigital Library
Alejandro Calleja, Juan Tapiador, and Juan Caballero. 2016. A Look into 30 Years of Malware Development from a Software Metrics Perspective. In Research in Attacks, Intrusions, and Defenses, Fabian Monrose, Marc Dacier, Gregory Blanc, and Joaquin Garcia-Alfaro (Eds.). Springer International Publishing, Cham, 325–345.Google Scholar
A. Calleja, J. Tapiador, and J. Caballero. 2019. The MalSource Dataset: Quantifying Complexity and Code Reuse in Malware Development. IEEE Transactions on Information Forensics and Security 14, 12(Dec 2019), 3175–3190. https://doi.org/10.1109/TIFS.2018.2885512Google ScholarCross Ref
N. Carlini and D. Wagner. 2017. Towards Evaluating the Robustness of Neural Networks. In 2017 IEEE Symposium on Security and Privacy (SP). 39–57. https://doi.org/10.1109/SP.2017.49Google ScholarCross Ref
Centro Criptológico Nacional (CCN-CERT). 2020. Ciberamenazas Y Tendencias. https://www.ccn-cert.cni.es/informes/informes-ccn-cert-publicos/5377-ccn-cert-ia-13-20-ciberamenazas-y-tendencias-edicion-2020/file.html Retrieved Nov 16, 2020 fromGoogle Scholar
Chronicle. 2004. VirusTotal. www.virustotal.com Retrieved Oct 16, 2020 fromGoogle Scholar
Itay Cohen and Eyal Itkin. 2020. GRAPHOLOGY OF AN EXPLOIT – HUNTING FOR EXPLOITS BY LOOKING FOR THE AUTHOR’S FINGERPRINTS. (2020). https://vblocalhost.com/uploads/VB2020-Cohen-Itkin.pdfGoogle Scholar
Stephen A. Cook. 1971. The Complexity of Theorem-Proving Procedures. In Proceedings of the Third Annual ACM Symposium on Theory of Computing (Shaker Heights, Ohio, USA) (STOC ’71). Association for Computing Machinery, New York, NY, USA, 151–158. https://doi.org/10.1145/800157.805047Google ScholarDigital Library
Council on Foreign Relations. 2020. Cyber Operations Tracker. https://www.cfr.org/interactive/cyber-operations Retrieved Oct 27, 2020 fromGoogle Scholar
cyber-research. 2019. APTMalware. https://github.com/cyber-research/APTMalware Retrieved Sep 25, 2020 fromGoogle Scholar
Edwin Dauber, Aylin Caliskan, Richard E. Harang, Gregory Shearer, Michael Weisman, Frederica Nelson, and Rachel Greenstadt. 2019. Git Blame Who?: Stylistic Authorship Attribution of Small, Incomplete Source Code Fragments. PoPETs 2019, 3 (2019), 389–408. https://doi.org/10.2478/popets-2019-0053Google ScholarCross Ref
M. V. Emmerik and T. Waddington. 2004. Using a decompiler for real-world source recovery. In 11th Working Conference on Reverse Engineering. 27–36. https://doi.org/10.1109/WCRE.2004.42Google ScholarCross Ref
Mohammad Reza Farhadi, Benjamin C.M. Fung, Yin Bun Fung, Philippe Charland, Stere Preda, and Mourad Debbabi. 2015. Scalable code clone search for malware analysis. Digital Investigation 15(2015), 46 – 60. https://doi.org/10.1016/j.diin.2015.06.001 Special Issue: Big Data and Intelligent Data Analysis.Google ScholarDigital Library
FireEye. 2017. FLOSS. https://github.com/fireeye/flare-floss Retrieved May 24, 2020 fromGoogle Scholar
Georgia Frantzeskou, Efstathios Stamatatos, Stefanos Gritzalis, Carole E. Chaski, and Blake Stephen Howald. 2007. Identifying Authorship by Byte-Level N-Grams: The Source Code Author Profile (SCAP) Method. IJDE 6, 1 (2007). http://www.utica.edu/academic/institutes/ecii/publications/articles/B41158D1-C829-0387-009D214D2170C321.pdfGoogle Scholar
Noah Gamer. 2016. The problem with open source malware. https://blog.trendmicro.com/the-problem-with-open-source-malware/ Retrieved May 29, 2020 fromGoogle Scholar
GitHub. 2020. GitHub Repositories. https://github.com Retrieved May 24, 2020 fromGoogle Scholar
Siyi Gong and Hao Zhong. 2021. Code Authors Hidden in File Revision Histories: An Empirical Study. In 2021 IEEE/ACM 29th International Conference on Program Comprehension (ICPC). 71–82. https://doi.org/10.1109/ICPC52881.2021.00016Google ScholarCross Ref
Hugo Gonzalez, Natalia Stakhanova, and Ali A. Ghorbani. 2018. Authorship Attribution of Android Apps. In Proceedings of the Eighth ACM Conference on Data and Application Security and Privacy (CODASPY ’18). ACM, 277–286. https://doi.org/10.1145/3176258.3176322Google ScholarDigital Library
Google. 2008-2020. Google Code Jam. https://codingcompetitions.withgoogle.com/codejam/ Retrieved May 24, 2020 fromGoogle Scholar
Google Scholar. [n. d.]. https://scholar.google.comGoogle Scholar
H. Haddadpajouh, A. Azmoodeh, A. Dehghantanha, and R. M. Parizi. 2020. MVFCC: A Multi-View Fuzzy Consensus Clustering Model for Malware Threat Attribution. IEEE Access 8(2020), 139188–139198.Google ScholarCross Ref
Karsten Hahn. 2021. Malware family naming hell is our own fault. https://www.gdatasoftware.com/blog/malware-family-naming-hell Retrieved Jan 31, 2023 fromGoogle Scholar
Weijie Han, Jingfeng Xue, Yong Wang, Fuquan Zhang, and Xianwei Gao. 2021. APTMalInsight: Identify and cognize APT malware based on system call information and ontology knowledge framework. Information Sciences 546(2021), 633–664. https://doi.org/10.1016/j.ins.2020.08.095Google ScholarCross Ref
Irfan Ul Haq and Juan Caballero. 2019. A Survey of Binary Code Similarity. CoRR abs/1909.11424(2019). arxiv:1909.11424 http://arxiv.org/abs/1909.11424Google Scholar
Steven Hendrikse. 2017. The Effect of Code Obfuscation on Authorship Attribution of Binary Computer Files. Ph. D. Dissertation. https://nsuworks.nova.edu/gscis_etd/1009Google Scholar
Ben Herzog. 2018. The GandCrab Ransomware Mindset. https://research.checkpoint.com/2018/gandcrab-ransomware-mindset/ Retrieved May 24, 2020 fromGoogle Scholar
Hex-Rays. [n. d.]. Decompiler. https://hex-rays.com/decompiler/ Retrieved Mar 3, 2023 fromGoogle Scholar
Floyd Hightower. 2017. Observable Finder. https://github.com/fhightower/ioc-finder Retrieved Oct 16, 2020 fromGoogle Scholar
Jiwon Hong, Sanghyun Park, Sang-Wook Kim, Dongphil Kim, and Wonho Kim. 2018. Classifying malwares for identification of author groups. Concurrency and Computation: Practice and Experience 30, 3(2018), e4197. https://doi.org/10.1002/cpe.4197 e4197 cpe.4197.Google ScholarCross Ref
Jiwon Hong, Sung-Jun Park, Taeri Kim, Yung-Kyun Noh, Sang-Wook Kim, Dongphil Kim, and Wonho Kim. 2019. Malware Classification for Identifying Author Groups: A Graph-Based Approach. In Proceedings of the Conference on Research in Adaptive and Convergent Systems (Chongqing, China) (RACS ’19). Association for Computing Machinery, New York, NY, USA, 169–174. https://doi.org/10.1145/3338840.3355684Google ScholarDigital Library
Matthew Honnibal and Ines Montani. 2017. spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing. To appear (2017).Google Scholar
M. Hurier, G. Suarez-Tangil, S. K. Dash, T. F. Bissyandé, Y. Le Traon, J. Klein, and L. Cavallaro. 2017. Euphony: Harmonious Unification of Cacophonous Anti-Virus Vendor Labels for Android Malware. In 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR). 425–435. https://doi.org/10.1109/MSR.2017.57Google ScholarDigital Library
Vaibhavi Kalgutkar, Ratinder Kaur, Hugo Gonzalez, Natalia Stakhanova, and Alina Matyukhina. 2019. Code Authorship Attribution: Methods and Challenges. ACM Comput. Surv. 52, 1, Article 3 (Feb. 2019), 36 pages. https://doi.org/10.1145/3292577Google ScholarDigital Library
Vaibhavi Kalgutkar, Natalia Stakhanova, Paul Cook, and Alina Matyukhina. 2018. Android authorship attribution through string analysis. In Proceedings of the 13th International Conference on Availability, Reliability and Security, ARES 2018, Hamburg, Germany, August 27-30, 2018, Sebastian Doerr, Mathias Fischer, Sebastian Schrittwieser, and Dominik Herrmann (Eds.). ACM, 4:1–4:10. https://doi.org/10.1145/3230833.3230849Google ScholarDigital Library
Kaspersky. 2020. The power of threat attribution. https://media.kaspersky.com/en/business-security/enterprise/threat-attribution-engine-whitepaper.pdf Retrieved Oct 02, 2020 fromGoogle Scholar
Eujeanne Kim, Sung-Jun Park, Seokwoo Choi, Dong-Kyu Chae, and Sang-Wook Kim. 2021. MANIAC: A Man-Machine Collaborative System for Classifying Malware Author Groups. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security(Virtual Event, Republic of Korea) (CCS ’21). Association for Computing Machinery, New York, NY, USA, 2441–2443. https://doi.org/10.1145/3460120.3485355Google ScholarDigital Library
B. Kolosnjaji, A. Demontis, B. Biggio, D. Maiorca, G. Giacinto, C. Eckert, and F. Roli. 2018. Adversarial Malware Binaries: Evading Deep Learning for Malware Detection in Executables. In 2018 26th European Signal Processing Conference (EUSIPCO). 533–537. https://doi.org/10.23919/EUSIPCO.2018.8553214Google ScholarCross Ref
Ivan Krsul and Eugene H. Spafford. 1997. Authorship analysis: identifying the author of a program. Comput. Secur. 16, 3 (1997), 233–257. https://doi.org/10.1016/S0167-4048(97)00005-9Google ScholarDigital Library
Giuseppe Laurenza and Riccardo Lazzeretti. 2020. dAPTaset: A Comprehensive Mapping of APT-Related Data. In Computer Security, Apostolos P. Fournaris, Manos Athanatos, Konstantinos Lampropoulos, Sotiris Ioannidis, George Hatzivasilis, Ernesto Damiani, Habtamu Abie, Silvio Ranise, Luca Verderame, Alberto Siena, and Joaquin Garcia-Alfaro (Eds.). Springer International Publishing, Cham, 217–225.Google Scholar
Giuseppe Laurenza, Riccardo Lazzeretti, and Luca Mazzotti. 2020. Malware Triage for Early Identification of Advanced Persistent Threat Activities. Digital Threats 1, 3, Article 16 (aug 2020), 17 pages. https://doi.org/10.1145/3386581Google ScholarDigital Library
Valentine Legoy, Marco Caselli, Christin Seifert, and Andreas Peter. 2020. Automated Retrieval of ATT&CK Tactics and Techniques for Cyber Threat Reports. arxiv:2004.14322 [cs.CR]Google Scholar
Antoine Lemay, Joan Calvet, François Menet, and José M. Fernandez. 2018. Survey of publicly available reports on advanced persistent threat actors. Computers and Security 72 (2018), 26 – 59. https://doi.org/10.1016/j.cose.2017.08.005Google ScholarDigital Library
Lockheed-Martin. 2015. Gaining The Advantage Applying Cyber Kill Chain® Methodology to Network Defense. https://www.lockheedmartin.com/content/dam/lockheed-martin/rms/documents/cyber/Gaining_the_Advantage_Cyber_Kill_Chain.pdf Retrieved May 24, 2020 fromGoogle Scholar
Andrea Marcelli, Mariano Graziano, Xabier Ugarte-Pedrero, Yanick Fratantonio, Mohamad Mansouri, and Davide Balzarotti. 2022. How Machine Learning Is Solving the Binary Function Similarity Problem. In 31st USENIX Security Symposium (USENIX Security 2022). USENIX Association.Google Scholar
Morgan Marquis-Boire, Marion Marschalek, and Claudio Guarnieri. 2015. BIG GAME HUNTING: THE PECULIARITIES IN NATIONSTATE MALWARE RESEARCH. (2015). https://www.blackhat.com/docs/us-15/materials/us-15-MarquisBoire-Big-Game-Hunting-The-Peculiarities-Of-Nation-State-Malware-Research.pdfGoogle Scholar
Masrepus, vfsrfs, and garanews. 2019. Un{i}packer. https://github.com/unipacker/unipacker Retrieved May 24, 2020 fromGoogle Scholar
Alina Matyukhina, Natalia Stakhanova, Mila Dalla Preda, and Celine Perley. 2019. Adversarial Authorship Attribution in Open-Source Projects. In Proceedings of the Ninth ACM Conference on Data and Application Security and Privacy (Richardson, Texas, USA) (CODASPY ’19). ACM, New York, NY, USA, 291–302. https://doi.org/10.1145/3292006.3300032Google ScholarDigital Library
Xiaozhu Meng. 2016. Fine-grained Binary Code Authorship Identification. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2016). ACM, 1097–1099. https://doi.org/10.1145/2950290.2983962Google ScholarDigital Library
Xiaozhu Meng and Barton P. Miller. 2018. Binary Code Multi-Author Identification in Multi-Toolchain Scenarios. Under Submission (2018). http://ftp.cs.wisc.edu/paradyn/papers/Meng17MultiToolchain.pdfGoogle Scholar
Xiaozhu Meng, Barton P. Miller, and Somesh Jha. 2018. Adversarial Binaries for Authorship Identification. CoRR abs/1809.08316(2018). arxiv:1809.08316 http://arxiv.org/abs/1809.08316Google Scholar
Xiaozhu Meng, Barton P. Miller, and Kwang-Sung Jun. 2017. Identifying Multiple Authors in a Binary Program. In Computer Security – ESORICS 2017, Simon N. Foley, Dieter Gollmann, and Einar Snekkenes (Eds.). Springer International Publishing, Cham, 286–304.Google ScholarCross Ref
Xiaozhu Meng, B. P. Miller, W. R. Williams, and A. R. Bernat. 2013. Mining Software Repositories for Accurate Authorship. In 2013 IEEE International Conference on Software Maintenance (ICSM). IEEE Computer Society, Los Alamitos, CA, USA, 250–259. https://doi.org/10.1109/ICSM.2013.36Google ScholarDigital Library
Najmeh Miramirkhani, Mahathi Priya Appini, Nick Nikiforakis, and Michalis Polychronakis. 2017. Spotless Sandboxes: Evading Malware Analysis Systems Using Wear-and-Tear Artifacts. In 2017 IEEE Symposium on Security and Privacy (SP). 1009–1024. https://doi.org/10.1109/SP.2017.42Google ScholarCross Ref
MISP: Open Source Threat Intelligence Platform. 2020. List of Threat Actors. https://raw.githubusercontent.com/MISP/misp-galaxy/main/clusters/threat-actor.json Retrieved Oct 27, 2020 fromGoogle Scholar
Mitre. 2020. ATT&CK. https://attack.mitre.org/ Retrieved May 22, 2020 fromGoogle Scholar
Tempestt J. Neal, Kalaivani Sundararajan, Aneez Fatima, Yiming Yan, Yingfei Xiang, and Damon L. Woodard. 2018. Surveying Stylometry Techniques and Applications. ACM Comput. Surv. 50, 6 (2018), 86:1–86:36. https://doi.org/10.1145/3132039Google ScholarDigital Library
OASIS Cyber Threat Intelligence. 2020. STIX/TAXII 2.0. https://oasis-open.github.io/cti-documentation/ Retrieved May 24, 2020 fromGoogle Scholar
Office of the Director of National Intelligence. 2018. A Guide to Cyber Attribution. https://www.dni.gov/files/CTIIC/documents/ODNI_A_Guide_to_Cyber_Attribution.pdf Retrieved Sep 25, 2020 fromGoogle Scholar
P. W. Oman and C. R. Cook. 1989. Programming Style Authorship Analysis. In Proceedings of the 17th Conference on ACM Annual Computer Science Conference (Louisville, Kentucky) (CSC ’89). Association for Computing Machinery, New York, NY, USA, 320–326. https://doi.org/10.1145/75427.75469Google ScholarDigital Library
Luca Pascarella, Fabio Palomba, Massimiliano Di Penta, and Alberto Bacchelli. 2018. How is Video Game Development Different from Software Development in Open Source?. In Proceedings of the 15th International Conference on Mining Software Repositories (Gothenburg, Sweden) (MSR ’18). Association for Computing Machinery, New York, NY, USA, 392–402. https://doi.org/10.1145/3196398.3196418Google ScholarDigital Library
Kexin Pei, Zhou Xuan, Junfeng Yang, Suman Jana, and Baishakhi Ray. 2020. Trex: Learning execution semantics from micro-traces for binary similarity. arXiv preprint arXiv:2012.08680(2020).Google Scholar
Daniel Plohmann, Martin Clauss, Steffen Enders, and Elmar Padilla. 2018. Malpedia: A Collaborative Effort to Inventorize the Malware Landscape. The Journal on Cybercrime & Digital Investigations 3, 1(2018). https://journal.cecyf.fr/ojs/index.php/cybin/article/view/17Google Scholar
Erwin Quiring, Alwin Maier, and Konrad Rieck. 2019. Misleading Authorship Attribution of Source Code using Adversarial Learning. In 28th USENIX Security Symposium (USENIX Security 19). USENIX Association, Santa Clara, CA, 479–496. https://www.usenix.org/conference/usenixsecurity19/presentation/quiringGoogle Scholar
Edward Raff, Richard Zak, Gary Lopez Munoz, William Fleming, Hyrum S. Anderson, Bobby Filar, Charles Nicholas, and James Holt. 2020. Automatic Yara Rule Generation Using Biclustering. In 13th ACM Workshop on Artificial Intelligence and Security (AISec’20). https://doi.org/10.1145/3411508.3421372Google ScholarDigital Library
Radim Řehůřek and Petr Sojka. 2010. Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. ELRA, Valletta, Malta, 45–50. http://is.muni.cz/publication/884893/en.Google Scholar
Rewterz. 2023. Annual Threat Intelligence Report 2022. https://www.rewterz.com/wp-content/uploads/2023/01/Annual-Threat-Intelligence-Report-2022.pdf Retrieved Mar 8, 2023 fromGoogle Scholar
Thomas Rid and Ben Buchanan. 2015. Attributing Cyber Attacks. Journal of Strategic Studies 38, 1-2 (2015), 4–37. https://doi.org/10.1080/01402390.2014.977382 arXiv:https://doi.org/10.1080/01402390.2014.977382Google ScholarCross Ref
Ed Robbins. 2017. Solvers for Type Recovery and Decompilation of Binaries. Ph. D. Dissertation. University of Kent,. https://kar.kent.ac.uk/61349/Google Scholar
Royi Ronen, Marian Radu, Corina Feuerstein, Elad Yom-Tov, and Mansour Ahmadi. 2018. Microsoft Malware Classification Challenge. https://doi.org/10.48550/ARXIV.1802.10135Google ScholarCross Ref
Ishai Rosenberg, Guillaume Sicard, and Eli (Omid) David. 2017. DeepAPT: Nation-State APT Attribution Using End-to-End Deep Neural Networks. In Artificial Neural Networks and Machine Learning – ICANN 2017, Alessandra Lintas, Stefano Rovetta, Paul F.M.J. Verschure, and Alessandro E.P. Villa (Eds.). Springer International Publishing, Cham, 91–99.Google ScholarCross Ref
Ishai Rosenberg, Guillaume Sicard, and Eli (Omid) David. 2018. End-to-End Deep Neural Networks and Transfer Learning for Automatic Analysis of Nation-State Malware. Entropy 20, 5. https://doi.org/10.3390/e20050390Google ScholarCross Ref
Jay Rosenberg and Christiaan Beek. 2018. Examining Code Reuse Reveals Undiscovered Links Among North Korea’s Malware Families. https://www.mcafee.com/blogs/other-blogs/mcafee-labs/examining-code-reuse-reveals-undiscovered-links-among-north-koreas-malware-families/ Retrieved May 24, 2020 fromGoogle Scholar
Nathan Rosenblum, Xiaojin Zhu, and Barton P. Miller. 2011. Who Wrote This Code? Identifying the Authors of Program Binaries. In Computer Security – ESORICS 2011, Vijay Atluri and Claudia Diaz (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 172–189.Google ScholarCross Ref
Nathan E. Rosenblum, Barton P. Miller, and Xiaojin Zhu. 2010. Extracting Compiler Provenance from Program Binaries. In Proceedings of the 9th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering (PASTE ’10). ACM, 21–28. https://doi.org/10.1145/1806672.1806678Google ScholarDigital Library
sapphirex00. 2018. APTs and OPs Table Guide. https://github.com/sapphirex00/Threat-Hunting/raw/master/apts_and_ops_tableguide.xlsx Retrieved Oct 27, 2020 fromGoogle Scholar
Marcos Sebastián, Richard Rivera, Platon Kotzias, and Juan Caballero. 2016. AVclass: A Tool for Massive Malware Labeling. In Research in Attacks, Intrusions, and Defenses - 19th International Symposium, RAID 2016, Paris, France, September 19-21, 2016, Proceedings. 230–253. https://doi.org/10.1007/978-3-319-45719-2_11Google ScholarCross Ref
Lucy Simko, Luke Zettlemoyer, and Tadayoshi Kohno. 2018. Recognizing and Imitating Programmer Style: Adversaries in Program Authorship Attribution. Proceedings on Privacy Enhancing Technologies 2018, 1(2018), 127 – 144. https://content.sciendo.com/view/journals/popets/2018/1/article-p127.xmlGoogle ScholarCross Ref
Qige Song, Yongzheng Zhang, Linshu Ouyang, and Yige Chen. 2022. BinMLM: Binary Authorship Verification with Flow-aware Mixture-of-Shared Language Model. In 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). 1023–1033. https://doi.org/10.1109/SANER53432.2022.00120Google ScholarCross Ref
Pasquale Stirparo, David Bizeul, Brian Bell, Ziv Chang, Joel Esler, Kristopher Bleich, Maite Moreno, Monnappa K A, J. Capmany, Paul Hutchinson, Boris Ivanov, Andre Gironda, Devon Ackerman, Carlos Fragoso, Eyal Sela, and Florian Egloff. 2015. APT Groups and Operations. https://apt.threattracking.com Retrieved May 24, 2020 fromGoogle Scholar
Symantec. 2019. Internet security threat report 2019. https://www.symantec.com/content/dam/symantec/docs/reports/istr-24-2019-en.pdf Retrieved May 24, 2020 fromGoogle Scholar
DBLP Team. 2020. DBLP computer science bibliography. https://dblp.uni-trier.de Retrieved May 24, 2020 fromGoogle Scholar
Thailand Computer Emergency Response Team. 2020. Threat Group Cards: A Threat Actor Encyclopedia. https://apt.thaicert.or.th/ Retrieved Oct 27, 2020 fromGoogle Scholar
Guido van Rossum, Barry Warsaw, and Nick Coghlan. 2001. PEP 8 Style Guide for Python Code. https://www.python.org/dev/peps/pep-0008/ Retrieved May 24, 2020 fromGoogle Scholar
VirusShare. [n. d.]. https://virusshare.com/Google Scholar
N. Virvilis and D. Gritzalis. 2013. The Big Four - What We Did Wrong in Advanced Persistent Threat Detection?. In 2013 International Conference on Availability, Reliability and Security(ARES), Vol. 00. 248–254. https://doi.org/10.1109/ARES.2013.32Google ScholarDigital Library
Daniel Votipka, Seth M Rabin, Kristopher Micinski, Jeffrey S Foster, and Michelle M Mazurek. 2020. An observational investigation of reverse engineers’ processes. In Proceedings of the 29th USENIX Conference on Security Symposium. 1875–1892.Google Scholar
VXUnderground. [n. d.]. https://vx-underground.org/Google Scholar
Qinqin Wang, Hanbing Yan, and Zhihui Han. 2021. Explainable APT Attribution for Malware Using NLP Techniques. In 2021 IEEE 21st International Conference on Software Quality, Reliability and Security (QRS). 70–80. https://doi.org/10.1109/QRS54544.2021.00018Google ScholarCross Ref
Claes Wohlin. 2014. Guidelines for snowballing in systematic literature studies and a replication in software engineering. In Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering (London, England, United Kingdom) (EASE ’14). Association for Computing Machinery, New York, NY, USA, Article 38, 10 pages. https://doi.org/10.1145/2601248.2601268Google ScholarDigital Library
H. Xue, S. Sun, G. Venkataramani, and T. Lan. 2019. Machine Learning-Based Analysis of Program Binaries: A Comprehensive Study. IEEE Access 7(2019), 65889–65912.Google ScholarCross Ref

Index Terms

Identifying Authorship in Malicious Binaries: Features, Challenges & Datasets

Recommendations

Malicious SSL Certificate Detection: A Step Towards Advanced Persistent Threat Defence
ICFNDS '17: Proceedings of the International Conference on Future Networks and Distributed Systems

Advanced Persistent Threat (APT) is one of the most serious types of cyber attacks, which is a new and more complex version of multistep attack. Within the APT life cycle, continuous communication between infected hosts and Command and Control (C&C) ...
Read More
Adversarial Authorship Attribution in Open-Source Projects
CODASPY '19: Proceedings of the Ninth ACM Conference on Data and Application Security and Privacy

Open-source software is open to anyone by design, whether it is a community of developers, hackers or malicious users. Authors of open-source software typically hide their identity through nicknames and avatars. However, they have no protection against ...
Read More
Formulistic Detection of Malicious Fast-Flux Domains
PAAP '12: Proceedings of the 2012 Fifth International Symposium on Parallel Architectures, Algorithms and Programming

Bonnet creates harmful network attacks nowadays. Lawbreaker may implant malware into victim machines using botnets and, furthermore, he employs fast-flux domain technology to improve the lifetime of botnets. To circumvent the detection of command and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Computing Surveys Just Accepted
ISSN:0360-0300
EISSN:1557-7341
Table of Contents

Copyright © 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Online AM: 26 March 2024
- Accepted: 7 March 2024
- Revised: 23 February 2024
- Received: 18 November 2020
Check for updates
Author Tags
adversarial
malware
authorship attribution
advanced persistent threats
datasets
Qualifiers
- survey
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 179
  Total Downloads
- Downloads (Last 12 months)179
- Downloads (Last 6 weeks)176
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Identifying Authorship in Malicious Binaries: Features, Challenges & Datasets

ACM Computing Surveys

Abstract

References

Cited By

Index Terms

Recommendations

Malicious SSL Certificate Detection: A Step Towards Advanced Persistent Threat Defence

Adversarial Authorship Attribution in Open-Source Projects

Formulistic Detection of Malicious Fast-Flux Domains

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Identifying Authorship in Malicious Binaries: Features, Challenges & Datasets

ACM Computing Surveys

Abstract

References

Cited By

Index Terms

Recommendations

Malicious SSL Certificate Detection: A Step Towards Advanced Persistent Threat Defence

Adversarial Authorship Attribution in Open-Source Projects

Formulistic Detection of Malicious Fast-Flux Domains

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media