Preview

Proceedings of Telecommunication Universities

Advanced search

The Technique of Multi-aspect Evaluation and Categorization of Malicious Information Objects on the Internet

https://doi.org/10.31854/1813-324X-2019-5-3-58-65

Abstract

Under the influence of rapid development in the sphere of information technologies, rises the challenge related to detection of malicious information sources on the Internet. To solve this we can use machine learning methods as one of the most popular and powerful tools designed to identify dependencies between input (observed) data and output (desired) results. This article presents a methodology which is aimed at multi-level processing of input data about malicious information objects on the Internet and providing their multi-aspect assessment and categorization using machine learning methods. The purpose of the investigation is to improve the efficiency of the detecting process of malicious information on the Internet using the examples of Web-pages classification.

About the Authors

A. .. Branitskiy
Saint-Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences
Russian Federation


I. .. Saenko
Saint-Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences
Russian Federation


References

1. Hayes P.J., Andersen P.M., Nirenburg I.B., Schmandt L.M. TCS: a shell for content-based text categorization // Proceedings of the Sixth Conference on Artificial Intelligence Applications (Santa Barbara, USA, 5-9 May 1990). Piscataway, NJ: IEEE, 1990. Vol. 1. PP. 320-326. DOI:10.1109/CAIA.1990.89206

2. Apté C., Damerau F., Weiss S.M. Automated learning of decision rules for text categorization // ACM Transactions on Information Systems (TOIS). 1994. Vol. 12. Iss. 3. PP. 233-251. DOI:10.1145/183422.183423

3. Salton G., Buckley C. Term-weighting approaches in automatic text retrieval // Information Processing & Management. 1988. Vol. 24. Iss. 5. PP. 513-523. DOI:10.1016/0306-4573(88)90021-0

4. Fattah M.A. A Novel Statistical Feature Selection Approach for Text Categorization // Journal of Information Processing Systems. 2017. Vol. 13. Iss. 5. PP. 1397-1409.

5. Lewis D.D., Ringuette M. A Comparison of Two Learning Algorithms for Text Categorization // In: Third Annual Symposium on Document Analysis and Information Retrieval. 1994. PP. 81-93.

6. Joachims T. Text categorization with Support Vector Machines: learning with many relevant features // Proceedings of the 10th European Conference on Machine Learning (ECML, Chemnitz, Germany, 21-23 April 1998). Lecture Notes in Computer Science (Lecture Notes in Artificial Intelligence). Berlin, Heidelberg: Springer, 1998. Vol. 1398. PP. 137-142. DOI:10.1007/BFb0026683

7. Johnson R., Zhang T. Effective Use of Word Order for Text Categorization with Convolutional Neural Networks // Proceeding of the Annual Conference of the North American Chapter of the Association for Computational Linguistics "Human Language Technologies" (Denver, USA, 31 May - 5 June 2015). Stroudsburg: Association for Computational Linguistics,2015. PP. 103-112. DOI:10.3115/v1/N15-1011

8. Ghareb A.S., Bakar A.A., Hamdan A.R. Hybrid feature selection based on enhanced genetic algorithm for text categorization // Expert Systems with Applications. 2016. Vol. 49. Iss. C. PP. 31-47. DOI:10.1016/j.eswa.2015.12.004

9. Lorena A.C., De Carvalho A.C., Gama J.M.P. A review on the combination of binary classifiers in multiclass problems // Artificial Intelligence Review. 2008. Vol. 30. Iss. 1-4. DOI:10.1007/s10462-009-9114-9

10. Kotenko I., Chechulin A., Shorov A., Komashinsky D. Analysis and Evaluation of Web Pages Classification Techniques for Inappropriate Content Blocking // Proceeding of the 14th Industrial Conference on Data Mining "Advances in Data Mining. Applications and Theoretical Aspects" (ICDM, St. Petersburg, Russia, 16-20 July 2014). Lecture Notes in Computer Science. Cham: Springer, 2014. Vol. 8557. PP. 39-54. DOI:10.1007/978-3-319-08976-8_4

11. Mikolov T., Chen K., Corrado G., Dean J. Efficient Estimation of Word Representations in Vector Space. 2013. URL: https:// arxiv.org/pdf/1301.3781 (дата обращения 10.04.2019)


Review

For citations:


Branitskiy A..., Saenko I... The Technique of Multi-aspect Evaluation and Categorization of Malicious Information Objects on the Internet. Proceedings of Telecommunication Universities. 2019;5(3):58-65. (In Russ.) https://doi.org/10.31854/1813-324X-2019-5-3-58-65

Views: 461


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 1813-324X (Print)
ISSN 2712-8830 (Online)