References

tuzsut

Труды учебных заведений связи

Proceedings of Telecommunication Universities

1813-324X2712-8830

СПбГУТ

10.31854/1813-324X-2019-5-3-58-65

tuzsut-87

Research Article

ИНФОРМАТИКА, ВЫЧИСЛИТЕЛЬНАЯ ТЕХНИКА И УПРАВЛЕНИЕ

INFORMATICS, COMPUTER ENGINEERING AND MANAGEMENT

Методика многоаспектной оценки и категоризации вредоносных информационных объектов в сети Интернет

The Technique of Multi-aspect Evaluation and Categorization of Malicious Information Objects on the Internet

Браницкий

А. А.

Branitskiy

A. ..

alexander.branitskiy@gmail.com

Саенко

И. Б.

Saenko

I. ..

noemail@neicon.ru

Санкт-Петербургский институт информатики и автоматизации Российской академии наукРоссияSaint-Petersburg Institute for Informatics and Automation of the Russian Academy of SciencesRussian Federation

2019

13042021

535865

2021

Браницкий А.А., Саенко И.Б.

Branitskiy A..., Saenko I...

This work is licensed under a Creative Commons Attribution 4.0 License.

https://tuzs.sut.ru/jour/article/view/87

В условиях быстрого развития информационных технологий возникает задача, связанная с обнаружением источников вредоносной информации в сети Интернет. Для ее решения могут применяться методы машинного обучения как один из наиболее популярных и мощных инструментов, предназначенных для выявления зависимостей между входными (наблюдаемыми) данными и выходными (желаемыми) результатами. В данной статье представлена методика, направленная на многоуровневую обработку входных данных о вредоносных информационных объектах в сети Интернет и обеспечивающая их многоаспектную оценку и категоризацию с использованием методов машинного обучения. Цель исследования заключается в повышении эффективности процесса обнаружения вредоносной информации в сети Интернет на примере задачи классификации веб-страниц.

Under the influence of rapid development in the sphere of information technologies, rises the challenge related to detection of malicious information sources on the Internet. To solve this we can use machine learning methods as one of the most popular and powerful tools designed to identify dependencies between input (observed) data and output (desired) results. This article presents a methodology which is aimed at multi-level processing of input data about malicious information objects on the Internet and providing their multi-aspect assessment and categorization using machine learning methods. The purpose of the investigation is to improve the efficiency of the detecting process of malicious information on the Internet using the examples of Web-pages classification.

информационные объектывредоносная информацияклассификаторывеб-страницымногоуровневая схема комбинирования

information objectsmalicious informationclassifiersWeb-pagesmulti-level combination scheme

References1

Hayes P.J., Andersen P.M., Nirenburg I.B., Schmandt L.M. TCS: a shell for content-based text categorization // Proceedings of the Sixth Conference on Artificial Intelligence Applications (Santa Barbara, USA, 5-9 May 1990). Piscataway, NJ: IEEE, 1990. Vol. 1. PP. 320-326. DOI:10.1109/CAIA.1990.89206

Apté C., Damerau F., Weiss S.M. Automated learning of decision rules for text categorization // ACM Transactions on Information Systems (TOIS). 1994. Vol. 12. Iss. 3. PP. 233-251. DOI:10.1145/183422.183423

Salton G., Buckley C. Term-weighting approaches in automatic text retrieval // Information Processing & Management. 1988. Vol. 24. Iss. 5. PP. 513-523. DOI:10.1016/0306-4573(88)90021-0

Fattah M.A. A Novel Statistical Feature Selection Approach for Text Categorization // Journal of Information Processing Systems. 2017. Vol. 13. Iss. 5. PP. 1397-1409.

Lewis D.D., Ringuette M. A Comparison of Two Learning Algorithms for Text Categorization // In: Third Annual Symposium on Document Analysis and Information Retrieval. 1994. PP. 81-93.

Joachims T. Text categorization with Support Vector Machines: learning with many relevant features // Proceedings of the 10th European Conference on Machine Learning (ECML, Chemnitz, Germany, 21-23 April 1998). Lecture Notes in Computer Science (Lecture Notes in Artificial Intelligence). Berlin, Heidelberg: Springer, 1998. Vol. 1398. PP. 137-142. DOI:10.1007/BFb0026683

Johnson R., Zhang T. Effective Use of Word Order for Text Categorization with Convolutional Neural Networks // Proceeding of the Annual Conference of the North American Chapter of the Association for Computational Linguistics "Human Language Technologies" (Denver, USA, 31 May - 5 June 2015). Stroudsburg: Association for Computational Linguistics,2015. PP. 103-112. DOI:10.3115/v1/N15-1011

Ghareb A.S., Bakar A.A., Hamdan A.R. Hybrid feature selection based on enhanced genetic algorithm for text categorization // Expert Systems with Applications. 2016. Vol. 49. Iss. C. PP. 31-47. DOI:10.1016/j.eswa.2015.12.004

Lorena A.C., De Carvalho A.C., Gama J.M.P. A review on the combination of binary classifiers in multiclass problems // Artificial Intelligence Review. 2008. Vol. 30. Iss. 1-4. DOI:10.1007/s10462-009-9114-9

Kotenko I., Chechulin A., Shorov A., Komashinsky D. Analysis and Evaluation of Web Pages Classification Techniques for Inappropriate Content Blocking // Proceeding of the 14th Industrial Conference on Data Mining "Advances in Data Mining. Applications and Theoretical Aspects" (ICDM, St. Petersburg, Russia, 16-20 July 2014). Lecture Notes in Computer Science. Cham: Springer, 2014. Vol. 8557. PP. 39-54. DOI:10.1007/978-3-319-08976-8_4

Mikolov T., Chen K., Corrado G., Dean J. Efficient Estimation of Word Representations in Vector Space. 2013. URL: https:// arxiv.org/pdf/1301.3781 (дата обращения 10.04.2019)

The authors declare that there are no conflicts of interest present.