A system that reconstructs 3D models from a single camera monitoring fish transported on a conveyor system is investigated. Models are subsequently used for training a species classifier and for improving estimates of discarded biomass. It is demonstrated that a monocular camera, combined with a conveyor's linear motion produces a constrained form of multiview structure from motion, that allows the 3D scene to be reconstructed using a conventional stereo pipeline analogous to that of a binocular camera. Although motion stereo was proposed several decades ago, the present work is the first to compare the accuracy and precision of monocular and binocular stereo cameras monitoring conveyors and operationally deploy a system. The system exploits Convolutional Neural Networks (CNNs) for foreground segmentation and stereo matching. Results from a laboratory model show that when the camera is mounted 750 mm above the conveyor, a median accuracy of <5 mm can be achieved with an equivalent baseline of 62 mm. The precision is largely limited by error in determining the equivalent baseline (i.e. distance travelled by the conveyor belt). When ArUco markers are placed on the belt, the inter quartile range (IQR) of error in z (depth) near the optical centre was found to be ±4 mm.