Recognizing RGB images from RGB-D data is a promising application, which significantly reduces the cost while can still retain high recognition rates. However, existing methods still suffer from the domain shifting problem due to conventional surveillance cameras and depth sensors are using different mechanisms. In this paper, we aim to simultaneously solve the above two challenges: 1) how to take advantage of the additional depth information in the source domain? 2) how to reduce the data distribution mismatch between the source and target domains? We propose a novel method called adaptive visual-depth embedding (aVDE), which learns the compact shared latent space between two representations of labeled RGB and depth modalities in the source domain first. Then the shared latent space can help the transfer of the depth information to the unlabeled target dataset. At last, aVDE models two separate learning strategies for domain adaptation (feature matching and instance reweighting) in a unified optimization problem, which matches features and reweights instances jointly across the shared latent space and the projected target domain for an adaptive classifier. We test our method on five pairs of data sets for object recognition and scene classification, the results of which demonstrates the effectiveness of our proposed method.