Lobbying activity is subject to strict disclosure requirements in the USA. Failure to comply with these requirements can lead to criminal and civil penalties. It is claimed that these tight lobbying disclosure measures resulted in an increase in ‘underground lobbying’. This research proposes a method to discover non-compliance in lobbying disclosure and gauge the magnitude of underground lobbying. We start from the premise that lobbying changes the text of the bills it targets. If these changes happen to some extent systematically, then the texts of lobbied bills should be discernible from non-lobbied bills. We combine the corpus of US legislative bills with a large dataset of lobbying activity to give us a partially labelled dataset, where a positive label indicates a lobbied bill, and the lack of a label indicates either that the bill was lobbied, or was lobbied but not disclosed. To address this partial labelling problem, we first set up a naive classification task, where we assume all unlabelled bills to have a negative label and train a model on a large corpus of US bills. By finding the best performing model, we then design a bagging method and collect out of fold predictions, to predict for each unlabelled bill whether it was lobbied or not. From these predictions, we infer that there are a sizable number of bills that are likely to have been lobbied, but this lobbying activity was not disclosed. We then investigate how the political affiliation of the sponsoring senators and congressmen relates to these probabilities.
- Lobbying disclosure
- Machine learning