Information Society is deeply transforming our way to communicate and to access information and services. The principal aim is to give services in a pervasive manner but the wide spread of possible and different access points (laptops, palms, mobiles and so on) makes necessary to monitor cybertraffic in order to identify and/or to prevent anomaly behaviours. Although Internet services produce a great quantity of data logged by hosts, it is impossible for people to monitor daily generated traffic in order to control network traffic trends and behaviours. Currently network traffic monitoring is obtained with a mixture of solutions adopted at different levels. Commercial and Open Source tools typically use descriptive statistics to give information about only the quantitative characteristics, at packet level, of the TCP/IP traffic as a whole. In this work, we use symbolic data analysis to build the longitudinal model of network traffic in order to obtain qualitative information about Internet traffic. In the preprocessing phase, original network daily traffic packets are aggregated in connections which are then clusterized and, in the successive step, a supervised inductive learning algorithm is used to describe clusters contents by means of classification rules. Rules are secondorder objects because they correspond to homogeneous classes or groups of connections whose individual description is a first-order object. The obtained rules are boolean symbolic objects because their variables are only set-valued. To build a longitudinal model of network traffic from daily sets of rules, we compute similarities between rules. Similar rules, w.r.t. a fixed threshold, represent a core for the model of the network traffic while other two derived measures for a userdefined time window, mean and minimum dissimilarity values, are used to identify the dynamics of network traffic. So we are easily able to detect both the prevalent aspects and the secondary aspects of network traffic and to identify its variations: the most similar and recurrent SOs in a fixed time window represent the dominant behaviour of network, similar and less frequent events are its secondary aspects while symbolic objects very different from other ones are anomalies. The proposed approach has been tested on log files of the firewall of our University Department and the results on are reported and commented.
Symbolic analysis to describe cybertraffic
CARUSO, COSTANTINA;MALERBA, Donato
2005-01-01
Abstract
Information Society is deeply transforming our way to communicate and to access information and services. The principal aim is to give services in a pervasive manner but the wide spread of possible and different access points (laptops, palms, mobiles and so on) makes necessary to monitor cybertraffic in order to identify and/or to prevent anomaly behaviours. Although Internet services produce a great quantity of data logged by hosts, it is impossible for people to monitor daily generated traffic in order to control network traffic trends and behaviours. Currently network traffic monitoring is obtained with a mixture of solutions adopted at different levels. Commercial and Open Source tools typically use descriptive statistics to give information about only the quantitative characteristics, at packet level, of the TCP/IP traffic as a whole. In this work, we use symbolic data analysis to build the longitudinal model of network traffic in order to obtain qualitative information about Internet traffic. In the preprocessing phase, original network daily traffic packets are aggregated in connections which are then clusterized and, in the successive step, a supervised inductive learning algorithm is used to describe clusters contents by means of classification rules. Rules are secondorder objects because they correspond to homogeneous classes or groups of connections whose individual description is a first-order object. The obtained rules are boolean symbolic objects because their variables are only set-valued. To build a longitudinal model of network traffic from daily sets of rules, we compute similarities between rules. Similar rules, w.r.t. a fixed threshold, represent a core for the model of the network traffic while other two derived measures for a userdefined time window, mean and minimum dissimilarity values, are used to identify the dynamics of network traffic. So we are easily able to detect both the prevalent aspects and the secondary aspects of network traffic and to identify its variations: the most similar and recurrent SOs in a fixed time window represent the dominant behaviour of network, similar and less frequent events are its secondary aspects while symbolic objects very different from other ones are anomalies. The proposed approach has been tested on log files of the firewall of our University Department and the results on are reported and commented.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.