Main Article Content

Abstract

Due to increase in internet based services, the size of network traffic data has become so large and complex that it is very     difficult to process with the traditional data processing tools. Fast and efficient cyber security intrusion detection is a very challenging problem due to big and complex nature of network traffic data. A realistic cyber security intrusion detection system should be able to process large size of network traffic data as fast as possible in order to detect the malicious traffic as early as possible. This paper used Apache Spark, a big data processing tool for processing the large size of network traffic data. In this paper, we have proposed a framework in which first a well-known feature selection algorithm is employed for selecting the most important features and then classification based intrusion detection method is used for fast and efficient detection of intrusion in  the massive network traffic. In this work, we have used two well-known feature selection algorithm, namely, correlation based feature selection and Chi-squared feature selection and five well known classification based intrusion detection methods, namely, Logistic regression, Support vector machines, Random forest, Gradient Boosted Decision trees & Naive Bayes.  A real time DARPA’s KDD’99 data set is used to validate the proposed framework and performance comparison of classification based intrusion detection schemes are evaluated in terms of training time, prediction time, accuracy, sensitivity and specificity.


 

Article Details