Hybrid Feature Selection Method for Improving File Fragment Classification

Alia Algurashi, Wenjia Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Identifying types of file fragments in isolation from their context is an essential task in digital forensic analysis and can be done with several methods. One common approach is to extract various types of features from file fragments as inputs for classification algorithms. However, this approach suffers from dimensionality curse as the number of the extracted features is too high, which causes the learning and classification to be both inefficient and inaccurate. This paper proposes a hybrid method to address this issue by using filters and wrappers to significantly reduce the number of features and also improve the accuracy of file type classification. First, it uses and combines three appropriate filters to filter out a large number of irrelevant and/or less important features, and then some wrappers to reduce the number of features further to the most salient ones. Our method was tested on some benchmark datasets - GovDocs, and the experimental results indicated that our method was able to not only reduce the number of features from 66,313 to 11–32, but also improve the accuracy of the classification, compared with other methods that used all the features.
Original languageEnglish
Title of host publicationArtificial Intelligence XXXVI - 39th SGAI International Conference on Artificial Intelligence, AI 2019, Proceedings
EditorsMax Bramer, Miltos Petridis
Pages379-391
Number of pages13
ISBN (Electronic)978-3-030-34885-4
DOIs
Publication statusPublished - 2019

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11927 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Keywords

  • Feature selection
  • File fragment classification
  • Filters
  • Forensics
  • Hybrid method
  • Wrappers

Cite this