In this paper, we present a simple mathematical trick that simplifies the derivation of Bayesian treatments of a variety of sparse kernel learning methods. The incomplete Cholesky factorisation due to (Fine and Scheinberg, 2001) is used to transform the dual parameter space, such that the covariance matrix of the Gaussian prior over model parameters becomes the identity matrix. The regularisation term is then the familiar weight-decay regulariser, allowing the Bayesian analysis to proceed straight-forwardly via the methods developed by MacKay (1992). As a bye-product, the incomplete Cholesky factorisation algorithm also identifies a subset of the training data forming an approximate basis for the remaining data in feature space, resulting in a sparse model. Bayesian treatments of the kernel ridge regression algorithm (Saunders et al., 1998), with both constant and input dependent variance structures, arc given as illustrative examples of the proposed technique, which we hope will be more widely applicable.
|Number of pages||6|
|Publication status||Published - 2005|
|Event||2005 International Joint Conference on Neural Networks - Montreal, Canada|
Duration: 31 Jul 2005 → 4 Aug 2005
|Conference||2005 International Joint Conference on Neural Networks|
|Period||31/07/05 → 4/08/05|