Although the vast majority of financial transactions is legitimate, even the miniscule number of fraudulent transactions is counted in billions of dollars. As with spam detection and video surveillance, fraud detection is a two-pronged anomaly-detection problem: to catch the bad transactions while permitting the good ones. Erring on either side - too few real positives or too many false positives - is a failure.

Modern statistical analytics and especially machine learning is the preferred solution. Treating the flow of transactions like any other data stream, machine learning makes it possible to go beyond blindly applying filters and rules to test each new instance or transaction.

Those who battled email spam from a few years ago will still remember deliberately misspelled words, or letters with odd symbols and characters interspersed - not for artistic impact, but to defeat that generation of mechanical spam filters.

Machine learning goes well beyond those simple rules, even allowing the model to be tweaked as the data stream evolves to defeat such simple tactics.

How does it work for fraudulent financial transactions? In short, by digesting as much data about the transaction itself, the user, the counterparty and their history as can be fed into the model, and then evaluating and assessing the statistical likelihood of fraud from there. The results of the evaluation and assessment are tweaked as the dataset evolves to further refine the model.

In the case of credit card payments, the purchaser, merchant, product, and a myriad of other factors all grist for the model - just for starters, time of day, address and location of merchant and purchaser, originating location of the transaction, along with statistics on the purchaser’s credit score, purchase history, etc. For medical insurance claims, again a host of factors about the claimant, the provider, and the claim itself. In the insurance industry, which arguably pioneered a rigorous actuarial approach to risk assessment and management, big data analytics enrich not just the costing of risk, but also the detection of irregularities and fraudulent transactions.

Fraud detection is essentially a classification problem. Rather than applying binary rules or filters to test each transaction, modern models assess the statistical likelihood of a transaction’s fraudulence based on past examples. Starting from an existing dataset of fraudulent transactions, the model can be trained by identifying each bad transaction’s common traits or elements. This can be done using clustering or other such methods. The model then assesses new transactions and scores them based how much or how little they have in common with the pool of fraudulent transactions.

Each new positive becomes a new example to supplement the existing pool, and the model evolves in this way. More interesting, though, is how bigger data, better processing, and sharper modelling help. With higher data resolution comes increased statistical accuracy, even in the absence of a “silver bullet” or causal element.

If the past is any indication, we can expect fraud detection to become ever more accurate and effective, both in terms of catching bad transaction as well as allowing the good ones.

See more examples of Machine Learning in our Everyday Encounters blog series >>