The paper shows the actual task of controlling access to Internet resources, which has important practical importance: blocking access to illegal, extremist, antisocial information, preventing the leakage of confidential information via the Internet, etc. To solve such problems, methods of machine learning are widely used.
Traditional methods for classifying network traffic, based on both port numbers and information load, rely on the direct study of network packets. If there is a complete and tagged training dataset, it is advisable to build a classifier using Machine Learning (ML) and Data Mining technologies, which turned out to be the most effective. It is impossible to create an "ideal" classifier, until the problems existing in this field are solved. First of all, this is the absence of a general, representative set of input data that could become standard for research in this field. Most of well-known studies devoted to the problem of traffic classification, omit the fundamental requirement to determine the unknown type of traffic.
The aim of the paper is to investigate the efficiency of algorithms for classifying network traffic applications in the presence of background traffic.
The novelty of the presented solution is the analysis of the following application groups: Web-protocols for browsing web-sites - http, https; ftp-protocol for transferring ftp files; mail-protocols for sending e-mail - SMTP, POP3, IMAP; p2p-protocols of applications that use peer-to-peer networks for file transfer using machine learning algorithms: C4.5; Random Forests; Support Vector Machine (SVM); Bagging and Adaptive Boost in the presence of unclassified (background) traffic.
It is shown that the quality of classification in the presence of background traffic is reduced for all classification algorithms under consideration. However, since the algorithms C4.5, Random Forests, Bagging, and AdaBoost are built on the use of decision trees - one in the case of C4.5 or the set, their characteristics remain sufficiently high and differ insignificantly.