Using lobbying data from OpenSecrets.org, we offer several experiments applying machine learning techniques to predict if a piece of legislation (US bill) has been subjected to lobbying activities or not. We also investigate the influence of the intensity of the lobbying activity on how discernible a lobbied bill is from one that was not subject to lobbying. We compare the performance of a number of different models (logistic regression, random forest, CNN and LSTM) and text embedding representations (BOW, TF-IDF, GloVe, Law2Vec). We report results of above 0.85\% ROC AUC scores, and 78\% accuracy. Model performance significantly improves (95\% ROC AUC, and 88\% accuracy) when bills with higher lobbying intensity are looked at. We also propose a method that could be used for unlabelled data. Through this we show that there is a considerably large number of previously unlabelled US bills where our predictions suggest that some lobbying activity took place. We believe our method could potentially contribute to the enforcement of the US Lobbying Disclosure Act (LDA) by indicating the bills that were likely to have been affected by lobbying but were not filed as such.
|Title of host publication||The 22nd International Conference on Big Data Analytics and Knowledge Discovery(DAWAK2020)|
|Editors||Min Song, Il-Yeol Song, Gabriele Kotsis, Ismail Khalil, A Min Tjoa|
|Number of pages||16|
|Publication status||Published - 2020|
- Rent seeking
- Text classification
- US bills