JEL Classification: C49, D22, E26, K42.
Keywords: Organized crime, infiltration, money laundering, economic crime, financial statements, machine learning.
The aim of this work is to develop a machine learning algorithm designed to detect firms that may have connections with organized crime (OC). To this end, we utilize a firm-level dataset for Italy, merging financial information from various sources, mainly public balance sheets. We compare a sample of over 28,000 Italian firms that are highly likely to be linked to OC with randomly selected samples of allegedly lawful firms to train and test the model. Based on out-of-sample test set, the algorithm successfully identifies approximately 76% of the OC-linked firms (recall) and 74% of the allegedly lawful firms (specificity). The primary output of the algorithm is a risk score, which might be applied at an operational level (for example, as a preliminary screening tool) for supporting the action of anti-money laundering authorities and law enforcement agencies. Confirmation of its operational validity, however, will have to come from further applications on the field.