The ability to predict the transcriptional regulation of genes, based on the composition of the upstream promoter region, would be a useful step in deciphering gene regulatory networks in eukaryotic organisms. In this paper we perform optimally regularised kernel Fisher discriminant (ORKFD) analysis of the upstream promoter sequences of genes to predict whether they are up- or down-regulated in response to glucose in the model plant Arahuiopsis thaliana. Three feature selection strategies are investigated, namely use of known promoter motifs drawn from the PLACE database, explicit enumeration of all possible k-mers and the use of the mismatch kernels (which effectively permits the construction of a linear model in the space of all possible k-mers with up to in mismatches). The leave-one-out cross-validation (LOOCV) error rate indicates that approximately two-thirds of the observed regulatory behaviour can be inferred by the presence of particular motifs in the upstream promoter sequence. The analysis has yielded novel biological insight, which has since been confirmed experimentally in vivo.
|Number of pages||6|
|Publication status||Published - 2005|
|Event||2005 International Joint Conference on Neural Networks - Montreal, Canada|
Duration: 31 Jul 2005 → 4 Aug 2005
|Conference||2005 International Joint Conference on Neural Networks|
|Period||31/07/05 → 4/08/05|