The performance of 36 models (22 ocean color models and 14 biogeochemical ocean circulation models (BOGCMs)) that estimate depth-integrated marine net primary productivity (NPP) was assessed by comparing their output to in situ 14C data at the Bermuda Atlantic Time series Study (BATS) and the Hawaii Ocean Time series (HOT) over nearly two decades. Specifically, skill was assessed based on the models' ability to estimate the observed mean, variability, and trends of NPP. At both sites, more than 90% of the models underestimated mean NPP, with the average bias of the BOGCMs being nearly twice that of the ocean color models. However, the difference in overall skill between the best BOGCM and the best ocean color model at each site was not significant. Between 1989 and 2007, in situ NPP at BATS and HOT increased by an average of nearly 2% per year and was positively correlated to the North Pacific Gyre Oscillation index. The majority of ocean color models produced in situ NPP trends that were closer to the observed trends when chlorophyll-a was derived from high-performance liquid chromatography (HPLC), rather than fluorometric or SeaWiFS data. However, this was a function of time such that average trend magnitude was more accurately estimated over longer time periods. Among BOGCMs, only two individual models successfully produced an increasing NPP trend (one model at each site). We caution against the use of models to assess multiannual changes in NPP over short time periods. Ocean color model estimates of NPP trends could improve if more high quality HPLC chlorophyll-a time series were available.