Aiming at efficient similarity search, hash functions are designed to embed high-dimensional feature descriptors to low-dimensional binary codes such that similar descriptors will lead to binary codes with a short distance in the Hamming space. It is critical to effectively maintain the intrinsic structure and preserve the original information of data in a hashing algorithm. In this paper, we propose a novel hashing algorithm called Latent Structure Preserving Hashing (LSPH), with the target of finding a well-structured low-dimensional data representation from the original high-dimensional data through a novel objective function based on Nonnegative Matrix Factorization (NMF) with their corresponding Kullback-Leibler divergence of data distribution as the regularization term. Via exploiting the joint probabilistic distribution of data, LSPH can automatically learn the latent information and successfully preserve the structure of high-dimensional data. To further achieve robust performance with complex and nonlinear data, in this paper, we also contribute a more generalized multi-layer LSPH (ML-LSPH) framework, in which hierarchical representations can be effectively learned by a multiplicative up-propagation algorithm. Once obtaining the latent representations, the hash functions can be easily acquired through multi-variable logistic regression. Experimental results on three large-scale retrieval datasets, i.e., SIFT 1M, GIST 1M and 500 K TinyImage, show that ML-LSPH can achieve better performance than the single-layer LSPH and both of them outperform existing hashing techniques on large-scale data.
- Nonnegative matrix factorization
- Latent structure
- Dimensionality reduction
- Multi-layer extension