Reverse Engineering of Software Using the Smart Brute Force Method: Step-by-Step Scheme
https://doi.org/10.31854/1813-324X-2025-11-4-129-142
EDN: UOKLHB
Abstract
Introduction: software vulnerabilities is one of the leading causes of threats to information security. Such vulnerabilities can be countered by directly searching for them in the program code and correcting it. This requires converting the executable code to a higher-level representation that's more suitable for searching and fixes; however, for a number of reasons, existing solutions cannot be considered satisfactory. One of these solutions – an exhaustive search of all possible variants of the source code, converted to a given machine code – is extremely costly in every way.
Purpose: developing a less costly and more efficient method of exhaustive searching through source code variants.
Methods: quantitative and qualitative comparison of different source code generators, as well as the formalization of this method by writing it in an analytical form.
Results: a 7-step scheme for selecting an instance of the source code according to a given machine code is proposed; the authors refer to this method as «smart» because of its optimal combinations of syntactic constructions of the programming language. This method of code generation is based on iterating through paths along the graph of syntactic rules that represent the formal syntax of a programming language in a given space. The syntax is presented as a parameter, which makes its steps completely invariant from the programming language of the source code. After multiple instances of the source code are generated, they are compiled into machine code and compared with the specified instance; if they match, the task of decompilation by smart exhaustive search is considered solved.
Practical significance: despite the time cost of using exhaustive searching in solving such tasks, the smart iteration method has shown expert efficiency in a number of application scenarios; thus, it can be directly applied to reverse engineering.
Discussion: the qualitative optimization of the "smart" exhaustive search can significantly improve it by genetic algorithms used.
About the Authors
K. E. IzrailovRussian Federation
M. V. Buinevich
Russian Federation
References
1. Tan T.-T., Wang B.-S., Tang Y., Zhou X. Crash Analysis Mechanisms in Vulnerability Mining Research. Proceedings of the 4th International Conference on Computer and Communication Systems, 23‒25 February 2019, Singapore, Singapore. IEEE; 2019. p.355‒359. DOI:10.1109/CCOMS.2019.8821775
2. Chondamrongkul N., Sun J., Warren I. Automated Security Analysis for Microservice Architecture. Proceedings of the International Conference on Software Architecture Companion, 16‒20 March 2020, Salvador, Brazil. IEEE; 2020. p.79‒82. DOI:10.1109/ICSA-C50368.2020.00024
3. Iannone E., Guadagni R., Ferrucci F., De Lucia A., Palomba F. The Secret Life of Software Vulnerabilities: a Large-Scale Empirical Study. IEEE Transactions on Software Engineering. 2023;49(1):44‒63. DOI:10.1109/TSE.2022.3140868. EDN:GKKIKO
4. Fu J., Zhang K., Zheng J., Li W., Zhu Y. Research and Application of Grey Box Detection Technology Based on Reverse Engineering and Dynamic Pollution Diffusion. Proceedings of the 7th Information Technology and Mechatronics Engineering Conference, 15‒17 September 2023, Chongqing, China. IEEE; 2023. p.2380‒2384. DOI:10.1109/ITOEC57671.2023.10291380
5. Devine T.R., Campbell M., Anderson M., Dzielski D. SREP+SAST: A Comparison of Tools for Reverse Engineering Machine Code to Detect Cybersecurity Vulnerabilities in Binary Executables. Proceedings of the International Conference on Computational Science and Computational Intelligence, 14‒16 December 2022, Las Vegas, USA. IEEE; 2022. p.862‒869. DOI:10.1109/CSCI58124.2022.00156
6. Bhardwaj V., Kukreja V., Sharma C., Kansal I., Popali R. Reverse Engineering-A Method for Analyzing Malicious Code Behavior. Proceedings of the International Conference on Advances in Computing, Communication, and Control, 03‒04 December 2021, Mumbai, India. IEEE; 2021. p.1‒5. DOI:10.1109/ICAC353642.2021.9697150
7. Izrailov K.E., Pokusov V.V. Software platform architecture for converting machine code into a high-level representation for expert search of vulnerabilities. Scientific Works of the Kuban State Technological University. 2021;6:93‒111. (in Russ.) EDN:AIOUWF
8. Buinevich M.V., Izrailov K.E., Pokusov V.V., Tailakov V.A., Fedulina I.N. An intelligent method of machine code algorithmization for vulnerabilities search. Zaŝita informacii. Inside. 2020;5(95):57‒63. (in Russ.) EDN:HIHDOM
9. Cummins C., Fisches Z.V., Ben-Nun T., Hoefler T., O'Boyle M.F.P., Leather H. ProGraML: A Graph-based Program Representation for Data Flow Analysis and Compiler Optimizations. Proceedings of the 38th International Conference on Machine Learning, PMLR, 18‒24 July 2021, vol.139. 2021. p.2244‒2253.
10. Izrailov K. Genetic Decompilation Concept of the Telecommunication Devices Machine Code. Proceedings of Telecommunication Universities. 2021;7(4):10‒17. (in Russ.) DOI:10.31854/1813-324X-2021-7-4-95-109. EDN:AIOFPM
11. Tonis R.B.M. Automating Scientific Paper Screening with Backus-Naur Form (BNF) Grammars. Didactica danubiensis. 2024;4(1):46–57.
12. Izrailov K.E. The genetic de-evolution concept of program representations. Part 1. Voprosy kiberbezopasnosti. 2024;1(59): 61‒66. (in Russ.) DOI:10.21681/2311-3456-2024-1-61-66. EDN:CBCKRF
13. Izrailov K.E. The genetic de-evolution concept of program representations. Part 2. Voprosy kiberbezopasnosti. 2024;2(60): 81‒86. (in Russ.) DOI:10.21681/2311-3456-2024-2-81-86. EDN:JUBPML
14. Hamberger P., Klammer C., Luger T., Moser M., Pfeiffer M., Piereder C. Specification-Based Test Case Generation for C++ Engineering Software. Proceedings of the International Conference on Software Maintenance and Evolution, ICSME, 01‒06 October 2023, Bogotá, Colombia. IEEE; 2023. p.519‒529. DOI:10.1109/ICSME58846.2023.00066
15. Sato Y. Specification-Based Test Case Generation with Constrained Genetic Programming. Proceedings of the 20th International Conference on Software Quality, Reliability and Security Companion, QRS-C, 11‒14 December 2020, Macau, China. IEEE; 2020. p.98‒103. DOI:10.1109/QRS-C51114.2020.00027
16. Huang C., Zhou H., Zhao H., Cai W., Zhou Z.Q., Jiang M. On the Usefulness of Crossover in Search-Based Test Case Generation: An Industrial Report. Proceedings of the 29th Asia-Pacific Software Engineering Conference, APSEC, 06‒09 December 2022, Japan. IEEE; 2022. p.417‒421. DOI:10.1109/APSEC57359.2022.00054
17. Schwachhofer D., Angione F., Becker S., Wagner S., Sauer M., Bernardi P., Polian I. Optimizing System-Level Test Program Generation via Genetic Programming. Proceedings of the European Test Symposium, ETS, 20‒24 May 2024, The Hague, Netherlands. IEEE; 2024. p.1‒4. DOI:10.1109/ETS61313.2024.10567817
18. Supaartagorn C. Web application for automatic code generator using a structured flowchart. Proceedings of the International Conference on Software Engineering and Service Science, ICSESS, 24‒26 November 2017, Beijing, China. IEEE; 2017. p.114‒117. DOI:10.1109/ICSESS.2017.8342876
19. Shinde K., Sun Y. Template-Based Code Generation Framework for Data-Driven Software Development. Proceedings of the 4th Intl Conf on Applied Computing and Information Technology / 3rd Intl Conf on Computational Science / Intelligence and Applied Informatics / 1st Intl Conf on Big Data, Cloud Computing, Data Science & Engineering, ACIT-CSII-BCD, 12‒14 December 2016, Las Vegas, USA. IEEE; 2016. p.55‒60. DOI:10.1109/ACIT-CSII-BCD.2016.023
20. Shimonaka K., Sumi S., Higo Y., Kusumoto S. Identifying Auto-Generated Code by Using Machine Learning Techniques. Proceedings of the 7th International Workshop on Empirical Software Engineering in Practice, IWESEP, 13 March 2016, Osaka, Japan. IEEE; 2016. p.18‒23. DOI:10.1109/IWESEP.2016.18
21. Igwe K., Pillay N. Automatic programming using genetic programming. Proceedings of the Third World Congress on Information and Communication Technologies, WICT 2013, 15‒18 December 2013, Hanoi, Vietnam. IEEE; 2013. p.337‒342. DOI:10.1109/WICT.2013.7113158
22. Biryukov D.N., Dudkin A.S., Zakharov O.O. A method for testing information security tools based on the use of multivariate source code generation according to a given functional specification. Proceedings of the Mozhaisky Military Space Academy. 2022;684:113‒122. (in Russ.) EDN:BJWKLG
23. Samohvalov E.N., Revunkov G.I., Gapanyuk Yu.E. Source code generation of software based on multilevel set of rules. Herald of the Bauman Moscow State Technical University. Series Instrument Engineering. 2014;5(98):77‒87. (in Russ.) EDN:SVZLSL
24. Sokolov A.P., Makarenkov V.M., Pershin A.Yu., Laishevskiy I.A. Development of template-based code generation software for development of computer-aided engineering system. Software Engineering. 2019;10(9-10):400‒416. (in Russ.) DOI:10.17587/prin.10.400-416. EDN:CHYPRE
25. Dovgal V.M., Korolkov O.F., Chaplygin A.A., Korolkova V.O. Considering of solving automatic code generation problem basing on given control production algorithm. In the World of Scientific Discoveries. 2012(1);25:220‒235. (in Russ.) EDN:PBBWKP
26. Andrianova A., Itsykson V. Source code and partial specifications analysis for automated generation of unit tests. Systems and Means of Informatics. 2014;24(2):99‒113. (in Russ.) DOI:10.14357/08696527140207. EDN:SJHATL
27. Saukh A.M., Hmelnov A.E. Source code fragments translation based on programming languages syntax and semantics specifications. Vestnik NSU. Series: Information Technologies. 2013;11(3):53‒62. (in Russ.) EDN:RCHBLB
28. Haq I.U., Caballero J.A. Survey of Binary Code Similarity. ACM Computing Surveys. 2021;54(3):1‒38. DOI:10.1145/3446371. EDN:KEPQCC
29. Kudelya V.N. Methods for enumerating paths in a graph. H&ES Research. 2023;15(5):28‒38. (in Russ.) DOI:10.36724/2409-5419-2023-15-5-28-38. EDN:HQEASN
30. Kussainov A.R., Glazyrina N.S. Overview of static program code analysis tools. Colloquium-Journal. 2020;32-1(84):48‒52. (in Russ.) EDN:JXSKQX
Review
For citations:
Izrailov K.E., Buinevich M.V. Reverse Engineering of Software Using the Smart Brute Force Method: Step-by-Step Scheme. Proceedings of Telecommunication Universities. 2025;11(4):129-142. https://doi.org/10.31854/1813-324X-2025-11-4-129-142. EDN: UOKLHB