Preview

Proceedings of Telecommunication Universities

Advanced search

Multiclass Classification of Attacks to Information Resources with Machine Learning Techniques

https://doi.org/10.31854/1813-324X-2019-5-1-107-115

Abstract

The article considers the classification of attacks on information resources using "classic" machine learning algorithms: k-Nearest Neighbors, Logistic Regression, Naive Bayes, Support Vectors, also ensemble methods: Decision Tree, Random Forest and Ada Boost. The research was conducted on the NSL-KDD data set using Python programming language libraries: scikit-learn, pandas and jupyter notebook. Data in the dataset were prepared for the research along with optimization of machine learning algorithm parameters. All fields in the dataset were marked with five classes, which correspond to four categories of attacks (DoS, U2R, R2L, Probe) and normal traffic (normal). A comparative analysis of the classification of each algorithm were made using different evaluation metrics. It was concluded that all the reasearched algorithms have shown insufficient efficiency in the conditions of data imbalance. It was proposed to perform additional actions on the initial dataset for better classification. The best results were demonstrated by the Random Forest algorithm.

About the Authors

M. .. Kazhemskiy
Moscow Technical University of Communication and Informatics
Russian Federation


O. .. Sheluhin
Moscow Technical University of Communication and Informatics
Russian Federation


References

1. Шелухин О.И. Сетевые аномалии. Обнаружение, локализация, прогнозирование. М.: Горячая линия-Телеком, 2019. 448 с.

2. Шелухин О.И., Сакалема Д.Ж., Филинова А.С. Обнаружение вторжений в компьютерные сети (сетевые аномалии). М: Горячая линия-Телеком, 2016. 220 c.

3. Thomas R., Pavithran D. A Survey of Intrusion Detection Models based on NSL-KDD Data Set // Proceedings of the 5th HCT Information Technology Trends (ITT, Dubai, United Arab Emirates, 28-29 November 2018). Piscataway, NJ: IEEE, 2018. PP. 286-291. DOI:10.1109/CTIT.2018.8649498

4. Dhanabal L., Shantharajah S.P. A Study on NSL-KDD Dataset for Intrusion Detection System Based on Classification Algorithms // International Journal of Advanced Research in Computer and Communication Engineering. 2015. Vol. 4. Iss. 6. PP. 446-452. DOI:10.17148/IJARCCE.2015.4696

5. Pervez M.S., Farid D.M. Feature selection and intrusion classification in NSL-KDD cup 99 dataset employing SVMs // Proceedings of the 8th International Conference on Software, Knowledge, Information Management and Applications (SKIMA, Dhaka, Bangladesh, 18-20 December 2014). Piscataway, NJ: IEEE, 2014. DOI:10.1109/SKIMA.2014.7083539

6. Revathi S., Malathi A. A Detailed Analysis on NSL-KDD Dataset Using Various Machine Learning Techniques for Intrusion Detection // International Journal of Engineering Research & Technology. 2013. Vol. 2. Iss. 12. PP. 1848-1853.

7. Paulauskas N., Auskalnis J. Analysis of data pre-processing influence on intrusion detection using NSL-KDD dataset // Proceedings of the Open Conference of Electrical, Electronic and Information Sciences (eStream, Vilnius, Lithuania, 27 April 2017). Piscataway, NJ: IEEE, 2017. DOI:10.1109/eStream.017.7950325

8. Meena G., Choudhary R.R. A review paper on IDS classification using KDD 99 and NSL KDD dataset in WEKA // Proceedings of the International Conference on Computer, Communications and Electronics (Comptelix, Jaipur, India, 1-2 July 2017). Piscataway, NJ: IEEE, 2017. PP. 553-558. DOI:10.1109/COMPTELIX.2017.8004032

9. Ingre B., Yadav A., Soni A.K. Decision Tree Based Intrusion Detection System for NSL-KDD Dataset // Proceedings of the International Conference on Information and Communication Technology for Intelligent Systems (ICTIS, Ahmedabad, India, 25-26 March 2017). Cham: Springer, 2017. Vol. 2. PP. 207-218. DOI:10.1007/978-3-319-63645-0_23

10. Protic D.D. Review of KDD CUP ‘99, NSL-KDD and KYOTO 2006+ datasets // Vojnotehnički Glasnik. 2018. Vol. 66. Iss. 3. PP. 580-596. DOI:10.5937/vojtehg66-16670

11. Bishop C.M. Pattern Recognition and Machine Learning. Berlin: Springer, 2006.

12. Шелухин О.И., Симонян А.Г., Ванюшина А.В. Влияние структуры обучающей выборки на эффективность классификации приложений трафика методами машинного обучения // T-Comm: Телекоммуникации и транспорт. 2017. Т. 11. № 2. С. 25-31.

13. Knowledge Discovery in Databases - обнаружение знаний в базах данных // BaseGroup Labs. Технологии анализа данных. URL: https://basegroup.ru/community/articles/kdd (дата обращения 04.03.2019)

14. Шелухин О.И., Ерохин С.Д., Ванюшина А.В. Классификация IP-трафика методами машинного обучения. М.: Горячая линия-Телеком, 2018. 284 с.

15. Mitchell Т. Machine Learning. NY: McGraw-Hill, 1997. 414 p.

16. Defazio A., Bach F., Lacoste-Julien S. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives // Proceedings of the 27th International Conference on Neural Information Processing Systems (NIPS, Montreal, Canada, 08-13 December 2014). Cambridge: MIT Press, 2014. Vol. 1. PP. 1646-1654.


Review

For citations:


Kazhemskiy M..., Sheluhin O... Multiclass Classification of Attacks to Information Resources with Machine Learning Techniques. Proceedings of Telecommunication Universities. 2019;5(1):107-115. (In Russ.) https://doi.org/10.31854/1813-324X-2019-5-1-107-115

Views: 673


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 1813-324X (Print)
ISSN 2712-8830 (Online)