An Evaluation of Image-Based Malware Classification Using Machine Learning

Tran The Son, Chando Lee, Hoa Le-Minh, Nauman Aslam, Moshin Raza, Nguyen Quoc Long

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper investigates the image-based malware classification using machine learning techniques. It is a recent approach for malware classification in which malware binaries are converted into images (i.e. malware images) prior to feeding machine learning models, i.e. k-nearest neighbour (k-NN), Naïve Bayes (NB), Support Vector Machine (SVM) or Convolution Neural Networks (CNN). This approach relies on image texture to classify a malware instead of signatures or behaviours of malware collected via malware analysis, thus it does not encounter a problem if the signatures of a new malware variant has not been collected or the behaviours of a new malware variant has not been updated. This paper evaluates classification performance of various machine learning classifiers (i.e. k-NN, NB, SVM, CNN) fed by malware images in various dimensions (i.e., 128 × 128, 64 × 64, 32 × 32, 16 × 16). The experiment results achieved on three different datasets including Malimg, Malheur and BIG2015 show that k-NN outperforms others on three datasets with high accuracy (i.e. 97.9%, 94.41% and 95.63% respectively). On the contrary, NB showed its weakness on image-based malware classification. Experiment results also indicate that the accuracy of the k-NN reaches the highest value at the input image size of 32 × 32 and tends to reduce if too many feature information provided by large input images, i.e. 64 × 64, 128 × 128.

Original languageEnglish
Title of host publicationAdvances in Computational Collective Intelligence - 12th International Conference, ICCCI 2020, Proceedings
EditorsMarcin Hernes, Krystian Wojtkiewicz, Edward Szczerbicki
PublisherSpringer
Pages125-138
Number of pages14
ISBN (Print)9783030631185
DOIs
Publication statusPublished - 2020
Event12th International Conference on International Conference on Computational Collective Intelligence, ICCCI 2020 - Da Nang, Viet Nam
Duration: 30 Nov 20203 Dec 2020

Publication series

NameCommunications in Computer and Information Science
Volume1287
ISSN (Print)1865-0929
ISSN (Electronic)1865-0937

Conference

Conference12th International Conference on International Conference on Computational Collective Intelligence, ICCCI 2020
Country/TerritoryViet Nam
CityDa Nang
Period30/11/203/12/20

Keywords

  • CNN
  • Deep Learning
  • Image-Based malware classification
  • k- NN
  • Naïve Bayes
  • SVM

Cite this