A simple trick for constructing Bayesian formulations of sparse kernel learning methods

Gavin C. Cawley, Nicola L. C. Talbot

Research output: Contribution to conferencePaper

2 Citations (Scopus)


In this paper, we present a simple mathematical trick that simplifies the derivation of Bayesian treatments of a variety of sparse kernel learning methods. The incomplete Cholesky factorisation due to (Fine and Scheinberg, 2001) is used to transform the dual parameter space, such that the covariance matrix of the Gaussian prior over model parameters becomes the identity matrix. The regularisation term is then the familiar weight-decay regulariser, allowing the Bayesian analysis to proceed straight-forwardly via the methods developed by MacKay (1992). As a bye-product, the incomplete Cholesky factorisation algorithm also identifies a subset of the training data forming an approximate basis for the remaining data in feature space, resulting in a sparse model. Bayesian treatments of the kernel ridge regression algorithm (Saunders et al., 1998), with both constant and input dependent variance structures, arc given as illustrative examples of the proposed technique, which we hope will be more widely applicable.
Original languageEnglish
Number of pages6
Publication statusPublished - 2005
Event2005 International Joint Conference on Neural Networks - Montreal, Canada
Duration: 31 Jul 20054 Aug 2005


Conference2005 International Joint Conference on Neural Networks
Abbreviated titleIJCNN-2005

Cite this