In this paper we present an algorithm which forms the preprocessing stage of a system for automatically classifying Amazon forest monkeys captured on video in their natural habitat. The work is motivated by a desire to automatically monitor animal populations in natural forest environments. The method applies a graph-theoretical clustering approach to spatial and motion fields to automatically segment monkeys moving in the foreground from trees and other vegetation in the background. The algorithm is described as follows: First a d'Alembertian of a spatial–temporal Gaussian filter is convolved with a sequence of image frames to obtain an image of temporal zero crossings. Subsequently, the magnitude of the visual motion vector in the image plane is estimated at each pixel of the image of temporal zero crossings and spatial-motion-based graph-theoretical clustering is applied to the resulting velocity image. The clustered pixels are then backprojected into the original color image for each subsequent frame to obtain a segmented image sequence. By applying a threshold to the velocity image, motion due to background vegetation and camera movement can be rejected, while segments extracted from animals are retained. This is extremely important for our application as the recognizer relies on color features that are extracted from the monkeys' fur. Experimental results are presented which show that the approach can successfully extract patches of monkey skin from video shot with a simple hand held camera.