Cross-Modality Submodular Dictionary Learning for Information Retrieval

Fan Zhu, Ling Shao, Mengyang Yu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

25 Citations (Scopus)


This paper addresses the problem of joint modeling of multimedia components in different media forms. We consider the information retrieval task across both text and image documents, which includes retrieving relevant images that closely match the description in a text query and retrieving text documents that best explain the content of an image query. A greedy dictionary construction approach is introduced for learning an isomorphic feature space, to which cross-modality data can be adapted while data smoothness is guaranteed. The proposed objective function consists of two reconstruction error terms for both modalities and a Maximum Mean Discrepancy (MMD) term that measures the cross-modality discrepancy. Optimization of the reconstruction terms and the MMD term yields a compact and modality-adaptive dictionary pair. We formulate the joint combinatorial optimization problem by maximizing variance reduction over a candidate signal set while constraining the dictionary size and coefficients' sparsity. By exploiting the submodularity and the monotonicity property of the proposed objective function, the optimization problem can be solved by a highly efficient greedy algorithm, and is guaranteed to be at least a (e - 1)=/e≈0.632- approximation to the optimum. The proposed method achieves state-of-the-art performance on the Wikipedia dataset.
Original languageEnglish
Title of host publicationProceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management
PublisherAssociation for Computing Machinery (ACM)
Number of pages10
ISBN (Print)978-1-4503-2598-1
Publication statusPublished - 3 Nov 2014

Cite this