Reverse Engineering of Software Using the Smart Brute Force Method: Prototype and Experiment
https://doi.org/10.31854/1813-324X-2025-11-6-88-100
EDN: TQFFYA
Abstract
Introduction. one approach to finding vulnerabilities in programs is converting the executable machine code into human-oriented source code, which would be more suitable for an information security expert. The authors previously developed a corresponding method for «smart» enumeration of source code variants to identify a copy that compiles to a given machine code. A logical continuation of this research would be the implementation of a software prototype to test the method's performance and experimentally determine some characteristics.
Purpose: implementing the software prototype of smart exhaustive search of source code variants (according to the described method), as well as the experimental evaluation of its operability and the limits of its applicability.
Methods: software engineering, experimentation, approximation of values.
Results. the creation of a software prototype for selecting an instance of the source code according to the set machine code, obtained in previous author's studies. The prototype was used to obtain the source code of a mathematical expression from its machine code using part of the formal syntax of a programming language (defined in the form of a graph of syntactic rules). A series of experiments was conducted to evaluate the characteristics of the prototype by determining the following dependencies: the number of all source code variants on the syntactic heterogeneity of the syntax and the maximum depth of traversal of its graphical representation, as well as the search time for a specific source code from a set depth of traversal. These tests proved the basic functionality of the prototype and its hypothetical potential, which also justifies the possibility of opposite reverse engineering as compared to the traditional method - from source code, rather than machine code.
Practical significance: the current version of the prototype can be used practically to decompile small parts of machine code, without being dependent on a specific programming language and processor architecture (since only a compilation tool is required).
Discussion: the qualitive optimization of the "smart" exhaustive search through the use of artificial intelligence in terms of genetic algorithms can significantly improve the search.
About the Authors
K. E. IzrailovRussian Federation
M. V. Buinevich
Russian Federation
References
1. Komarov V., Mesinova N., Evdokimova E. Analysis of the conditions for the implementation of information security threats through the exploitation of information asset vulnerabilities. Scientific and Analytical Journal «Vestnik Saint-Petersburg University of State Fire Service of EMERCOM of Russia». 2024;2:126–135. (in Russ.) DOI:10.61260/2218-130X-2024-2-126-135. EDN:NRXXGV
2. Le T.D., Pham M.H., Dinh T.D., Do H.P. Applying machine learning algorithms for PE-header-based malware detection on the Windows operating system. Information and Control Systems. 2022;4(119):44–57. (in Russ.) DOI:10.31799/1684-8853-2022-4-44-57. EDN:YFIBQJ
3. Izrailov K.E., Buinevich M.V. Reverse Engineering of Software Using the Smart Brute Force Method: Step-by-Step Scheme. Proceedings of Telecommunication Universities. 2025;11(4):129–142. (in Russ.) DOI:10.31854/1813-324X-2025-11-4-129-142. EDN:UOKLHB
4. Putro H.P., Yuhana U.L., Yuniarno E.M., Purnomo M.H. Source Code Statement Classification Using ANTLR and Random Forest. Proceedings of the International Seminar on Intelligent Technology and Its Applications, ISITIA, 26–27 July 2023, Surabaya, Indonesia. IEEE; 2023. p.60–65. DOI:10.1109/ISITIA59021.2023.10220999
5. Fu J., Zhang K., Zheng J., Li W., Zhu Y. Research and Application of Grey Box Detection Technology Based on Reverse Engineering and Dynamic Pollution Diffusion. Proceedings of the 7th Information Technology and Mechatronics Engineering Conference, ITOEC, 15–17 September 2023, Chongqing, China. IEEE; 2023. p.2380–2384. DOI:10.1109/ITOEC57671.2023.10291380
6. Devine T.R., Campbell M., Anderson M., Dzielski D. SREP+SAST: A Comparison of Tools for Reverse Engineering Machine Code to Detect Cybersecurity Vulnerabilities in Binary Executables. Proceedings of the International Conference on Computational Science and Computational Intelligence, CSCI, 14–16 December 2022, Las Vegas, USA. IEEE; 2022. p.862–869. DOI:10.1109/CSCI58124.2022.00156
7. Hu Y., Wang H., Zhang Y., Li B., Gu D. A Semantics-Based Hybrid Approach on Binary Code Similarity Comparison. Transactions on Software Engineering. 2021;47(6):1241–1258. DOI:10.1109/TSE.2019.2918326. EDN:ILNITT
8. Adamchuk N., Schlüter W. Automatic Acceptor Generation Based on EBNF Grammar Definition. Proceedings of the 11th International Conference on Advanced Computer Information Technologies, ACIT, 15–17 September 2021, Deggendorf, Germany. IEEE; 2021. p.618–622. DOI:10.1109/ACIT52158.2021.9548492
9. Savchenko A.A., Mineeva T.A. Assembly Programming Language. The Difference Between Low-Level and High-Level Languages. Tendencii razvitija nauki i obrazovanija. 2022;92-10:131–135. (in Russ.) DOI:10.18411/trnio-12-2022-502. EDN:QMZFNE
10. Nechesov A.V. Some questions on polynomially computable representations for generating grammars and Backus-Naur forms. Mathematical works. 2022;25(1):134–151. (in Russ.) DOI:10.33048/mattrudy.2022.25.106. EDN:SFDFPB
11. Ryazanov Yu.D., Nazina S.V. Building parsers based on syntax diagrams with multiport components. Applied Discrete Mathematics. 2022;55:102–119 (in Russ.) DOI:10.17223/20710410/55/8. EDN:XHAFEV
12. Tretyak A.V., Tretyak E.V., Vereshchagina E.A. Development of cognitive-ergonomic syntax for a new hardware-oriented programming language. Modern Science: Actual Problems of Theory and Practice". Series "Natural and Technical Sciences". 2020;7:145–153. (in Russ.) DOI:10.37882/2223-2966.2020.07.33. EDN:GVAAGG
13. Kostenko M.S., Cicareva V.V. Application of Efficient Graph Traversal Methods (Depth-First Search, Breadth-First Search) in Solving Problems of the Second Stage of the Republican Olympiad in the Subject "Computer Science". Sovremennoe obrazovanie Vitebshhiny. 2024;2(44):24–26. (in Russ.) EDN:WRZIGQ
14. Zaginajlo M. V., Fathi V. A. Genetic Algorithm as an Effective Tool for Evolutionary Algorithms. Innovacii. Nauka. Obrazovanie. 2020;22:513–518 (in Russ.) EDN:UTMAEL
15. Izrailov K.E. The genetic de-evolution concept of program representations. Part 1. Voprosy kiberbezopasnosti. 2024;1(59):61–66. (in Russ.) DOI:10.21681/2311-3456-2024-1-61-66. EDN:CBCKRF
16. Izrailov K.E. The genetic de-evolution concept of program representations. Part 2. Voprosy kiberbezopasnosti, 2024;2(60):81–86. (in Russ.) DOI:10.21681/2311-3456-2024-2-81-86. EDN:JUBPML
17. He H., Lin L., Yu T., Zhong X. CloneBAS: A Code Clone Detection Method Based on Abstract Syntax Tree and Simhash. Proceedings of the 3rd International Conference on Data Science and Computer Application, ICDSCA, 27–29 October 2023, Dalian, China. IEEE; 2023. p.1539–1544. DOI:10.1109/ICDSCA59871.2023.10392292
18. Izrailov K. GREMC: Genetic Reverse-Engineering of Machine Code to Search Vulnerabilities in Software for Industry 4.0. Predicting the Size of the Decompiling Source Code. Proceedings of the International Russian Smart Industry Conference, SmartIndustryCon, 25–29 March 2024, Sochi, Russian Federation. IEEE; 2024. p.622–628. DOI:10.1109/SmartIndustryCon61328.2024.10515515
19. Mironov S.V., Batraeva I.A., Dunaev P.D. Library for development of compilers. Proceedings of the Institute for System Programming of the RAS. 2022;34(5):77–88 (in Russ.). Doi:10.15514/ISPRAS-2022-34(5)-5. EDN:JPGPIY
20. Qu Z., Hu Y., Zeng J., Cai B., Yang S. Method Name Generation Based on Code Structure Guidance. Proceedings of the International Conference on Software Analysis, Evolution and Reengineering, SANER, 15–18 March 2022, Honolulu, USA. IEEE; 2022. p.1101–1110. DOI:10.1109/SANER53432.2022.00127
21. Petukhov M., Gudauskayte E., Kaliyev A., Oskin M., Ivanov D., Wang Q. Method Name Prediction for Automatically Generated Unit Tests. Proceedings of the International Conference on Code Quality, ICCQ, 23 April 2022, Innopolis, Russian Federation. IEEE; 2022. p.29–38. DOI:10.1109/ICCQ53703.2022.9763112. EDN:TOCMXI
22. Borodin A.V., Yudina M.A., Vasileva M.A. About the problem of classification on the neighborhood of the root of the control flow graph of the program in the context of process of reproduction of file computer viruses. Modern High Technologies. 2019;1:31–35 (in Russ.) EDN:VUCEWK
23. Kudelya V.N. Methods for enumerating paths in a graph. H&ES Research. 2023;15(5):28–38 (in Russ.) DOI:10.36724/2409-5419-2023-15-5-28-38. EDN:HQEASN
24. Kussainov A.R., Glazyrina N.S. Overview of static program code analysis tools. Colloquium-Journal. 2020;32-1(84):48–52. (in Russ.) EDN:JXSKQX
25. Kotenko I., Izrailov K., Buinevich M., Saenko I., Shorey R. Modeling the Development of Energy Network Software, Taking into Account the Detection and Elimination of Vulnerabilities. Energies. 2023;16(13):5111. DOI:10.3390/en16135111. EDN:CFRQLO
26. Pichugova L.N. Perspective technologies of reverse engineeringand fast prototyping. Fundamentalnye osnovy mehaniki. 2023;11:43–48. (in Russ.) DOI:10.26160/2542-0127-2023-11-43-48. EDN:CYVEES
27. Aralbaev R.A., Tarasov A.A. Optimization Problems and Application of Genetic Algorithms in Practice. Innovacii. Nauka. Obrazovanie. 2021;48:1645–1653. (in Russ.) EDN:VGUBIH
Review
For citations:
Izrailov K.E., Buinevich M.V. Reverse Engineering of Software Using the Smart Brute Force Method: Prototype and Experiment. Proceedings of Telecommunication Universities. 2025;11(6):88-100. (In Russ.) https://doi.org/10.31854/1813-324X-2025-11-6-88-100. EDN: TQFFYA


























