Assessment of hierarchical clustering methodologies for proteomic data mining

Bruno Meunier, Emilie Dumas, Isabelle Piec, Daniel Bechet, Michel Hebraud, Jean-Francois Hocquette

Research output: Contribution to journalArticlepeer-review

126 Citations (Scopus)

Abstract

Hierarchical clustering methodology is a powerful data mining approach for a first exploration of proteomic data. It enables samples or proteins to be grouped blindly according to their expression profiles. Nevertheless, the clustering results depend on parameters such as data preprocessing, between-profile similarity measurement, and the dendrogram construction procedure. We assessed several clustering strategies by calculating the F-measure, a widely used quality metric. The combination, on logged matrix, of Pearson correlation and Ward's methods for data aggregation is among the best clustering strategies, at least with the data sets we studied. This study was carried out using PermutMatrix, a freely available software derived from transcriptomics.
Original languageEnglish
Pages (from-to)358–366
Number of pages9
JournalJournal of Proteome Research
Volume6
Issue number1
DOIs
Publication statusPublished - 2007

Cite this