Network intrusion detection using hybrid machine learning methods
Abstract
Network intrusion detection is a relevant cybersecurity research field. The growing number of intrusions requires more sophisticated methods to protect computer networks. Various machine learning algorithms are used to detect network intrusions and anomalies, but their accuracy is limited. In this research, we address the problem of improving network-level intrusion detection by applying hybrid machine-learning algorithms. The paper proposes three new hybrid machine learning methods and investigates their accuracy using two publicly available datasets CSE-CIC-IDS2018 and NSW-NB-15. In order to increase the accuracy of the classification models, hyperparameter optimization was performed. The iteration method and the Chi-square χ2 test were used to identify significant features of the data set. Analyzing the research results, it was found that the highest network anomaly recognition accuracy of 99.34% was achieved by applying a hybrid algorithm consisting of a decision tree, naive Bayesian, and multilayer perceptron algorithms. Achieved result is 3.13% higher than the best accuracy achieved by individual machine learning algorithms. In order to comprehensively evaluate the studied machine learning algorithms and their suitability for detecting intrusions in a computer network, the algorithms were ranked using the SCR, DR, FR ranking methods.
Article in Lithuanian.
Įsilaužimų aptikimas kompiuterių tinkluose taikant hibridinius mašininio mokymosi metodus
Santrauka
Viena iš aktualių kibernetinės saugos tyrimų krypčių – tai įsilaužimų arba anomalijų aptikimas kompiuterių tinkle. Įsilaužimų skaičius nuolat didėja, o taikomos įsilaužimo technikos ir metodai sudėtingėja, todėl siekiant apsaugoti kompiuterių tinklą, reikia taikyti vis sudėtingesnius apsaugos metodus. Tinklo įsilaužimams ir anomalijoms nustatyti taikomi įvairūs mašininio mokymosi algoritmai, tačiau jų tikslumas yra ribotas. Siekiama pagerinti tinklo anomalijų aptikimą, taikomi hibridiniai mašininio mokymosi algoritmai. Straipsnyje pasiūlyti trys nauji hibridiniai mašininio mokymosi algoritmai, ištirtas jų tikslumas naudojant du viešai prieinamus duomenų rinkinius, t. y. CSE-CIC-IDS2018 ir NSW-NB-15. Siekiant padidinti klasifikavimo modelių tikslumą, buvo atliktas hiperparametrų optimizavimas. Reikšmingiems duomenų rinkinio požymiams nustatyti taikytas iteracijų metodas ir Chi kvadrato χ2 testas. Analizuojant tyrimo rezultatus, nustatyta, kad aukščiausias tinklo anomalijų atpažinimo tikslumas 99,34 % buvo pasiektas taikant hibridinį algoritmą, sudarytą iš sprendimų medžio, naivaus Bajeso ir daugiasluoksnio perceptrono algoritmų rinkinio. Šis rezultatas yra 3,13 % geresnis, lyginant su geriausiu tikslumu, gautu taikant atskirus mašininio mokymosi algoritmus. Siekiant kompleksiškai įvertinti tirtus mašininio mokymosi algoritmus ir jų tinkamumą įsilaužimams kompiuterių tinkle aptikti, algoritmai buvo sureitinguoti taikant SCR, DR, FR reitingavimo metodus.
Reikšminiai žodžiai: tinklo anomalijos, mašininis mokymasis, χ2 Chi kvadratu testas, hiperparametrai, hibridiniai algoritmai.
Keyword : network anomalies, machine learning, χ2- Chi-squared test, hyperparameters, hybrid algorithms
This work is licensed under a Creative Commons Attribution 4.0 International License.
References
Aminanto, E., & Kim, K. (2016). Deep learning in intrusion detection system: An overview. In 2016 International Research Conference on Engineering and Technology (2016 IRCET) [Conference presentation]. Higher Education Forum.
Ashiku, L., & Dagli, C. (2021). Network intrusion detection system using deep learning. Procedia Computer Science, 185, 239–247. https://doi.org/10.1016/j.procs.2021.05.025
Atefinia, R., & Ahmadi, M. (2021). Network intrusion detection using multi-architectural modular deep neural network. The Journal of Supercomputing, 77(4), 3571–3593. https://doi.org/10.1007/s11227-020-03410-y
Bao, Y., Tang, Z., Li, H., & Zhang, Y. (2019). Computer vision and deep learning–based data anomaly detection method for structural health monitoring. Structural Health Monitoring, 18(2), 401–421. https://doi.org/10.1177/1475921718757405
Bhuyan, M. H., Bhattacharyya, D. K., & Kalita, J. K. (2014). Towards an unsupervised method for network anomaly detection in large datasets. Computing and Informatics, 33(1), 1–34.
Bulavas, V., Marcinkevičius, V., & Rumiński, J. (2021). Study of multi-class classification algorithms’ performance on highly imbalanced network intrusion datasets. Informatica, 32(3), 441–475. https://doi.org/10.15388/21-INFOR457
Chen, R. C., Dewi, C., Huang, S. W., & Caraka, R. E. (2020). Selecting critical features for data classification based on machine learning methods. Journal of Big Data, 7(1), 1–26. https://doi.org/10.1186/s40537-020-00327-4
Chkirbene, Z., Eltanbouly, S., Bashendy, M., AlNaimi, N., & Erbad, A. (2020). Hybrid machine learning for network anomaly intrusion detection. In 2020 IEEE International Conference on Informatics, IoT, and Enabling Technologies (ICIoT) (pp. 163–170), Doha, Qatar. https://doi.org/10.1109/ICIoT48696.2020.9089575
Ferrag, M. A., Maglaras, L., Moschoyiannis, S., & Janicke, H. (2020). Deep learning for cyber security intrusion detection: Approaches, datasets, and comparative study. Journal of Information Security and Applications, 50, 102419. https://doi.org/10.1016/j.jisa.2019.102419
Fillbrunn, A., Dietz, C., Pfeuffer, J., Rahn, R., Landrum, G. A., & Berthold, M. R. (2017). KNIME for reproducible cross-domain analysis of life science data. Journal of Biotechnology, 261, 149–156. https://doi.org/10.1016/j.jbiotec.2017.07.028
Hassan, M. M., Gumaei, A., Alsanad, A., Alrubaian, M., & Fortino, G. (2020). A hybrid deep learning model for efficient intrusion detection in big data environment. Information Sciences, 513, 386–396. https://doi.org/10.1016/j.ins.2019.10.069
Kanimozhi, V., & Jacob, T. P. (2019). Artificial intelligence based network intrusion detection with hyper-parameter optimization tuning on the realistic cyber dataset CSE-CIC-IDS2018 using cloud computing. In 2019 International Conference on Communication and Signal Processing (ICCSP) (pp. 0033–0036). IEEE. https://doi.org/10.1109/ICCSP.2019.8698029
Kwon, D., Kim, H., Kim, J., Suh, S. C., Kim, I., & Kim, K. J. (2019). A survey of deep learning-based network anomaly detection. Cluster Computing, 22(1), 949–961. https://doi.org/10.1007/s10586-017-1117-8
Megantara, A. A., & Ahmad, T. (2021). A hybrid machine learning method for increasing the performance of network intrusion detection systems. Journal of Big Data, 8(1), 1–19. https://doi.org/10.1186/s40537-021-00531-w
Mohri, M., Rostamizadeh, A., & Talwalkar, A. (2018). Foundations of machine learning. The MIT Press.
Pandis, N. (2016). The chi-square test. American Journal of Orthodontics and Dentofacial Orthopedics, 150(5), 898–899. https://doi.org/10.1016/j.ajodo.2016.08.009
Trustware. (2020). Trustware global security report. https://www.trustwave.com/en-us/resources/library/documents/2020-trustwave-global-security-report/
Vaitkevicius, P., & Marcinkevicius, V. (2020). Comparison of classification algorithms for detection of phishing websites. Informatica, 31(1), 143–160. https://doi.org/10.15388/20-INFOR404