Several factors affecting the automatic classification of musical audio signals are examined. Classification is performed on short audio frames and results are reported as “bag of frames” accuracies, where the audio is segmented into 23ms analysis frames and a majority vote is taken to decide the final classification. The effect of different parameterisations of the audio signal is examined. The effect of the inclusion of information on the temporal variation of these features is examined and finally, the performance of several different classifiers trained on the data is compared. A new classifier is introduced, based on the unsupervised construction of decision trees and either linear discriminant analysis or a pair of single Gaussian classifiers. The classification results show that the topology of the new classifier gives it a significant advantage over other classifiers, by allowing the classifier to model much more complex distributions within the data than Gaussian schemes do.
|Publication status||Published - 2004|
|Event||5th International Conference on Music Information Retrieval - Barcelona, Spain|
Duration: 10 Oct 2004 → 15 Oct 2004
|Conference||5th International Conference on Music Information Retrieval|
|Abbreviated title||ISMIR 2004|
|Period||10/10/04 → 15/10/04|