Microsoft made the news today by publishing a study that illustrates how they were able to use Machine Learning to create a model that can reliably identify search users that were later diagnosed with pancreatic cancer based on their search patterns even before they were clinically diagnosed. They were able to do this in 5-15% of cases while maintaining a very low rate of false positives.

This breakthrough could help to lay the groundwork for systems that can aid in the early detection of a cancer that has a very low five-year survival rate. This is an amazing and exciting application of Machine Learning in the real-world.

Here's what the researchers did:

  1. Identified anonymous Bing searchers who used queries that were highly suggestive of receiving a diagnosis of pancreatic cancer (like "I was told I have pancreatic cancer, what to expect").
  2. Dug back into the archives of search logs for these users to see what they were searching on long before they searched using these terms.
  3. Examined their patterns of searching for symptoms.
  4. Built statistical classifiers to predict whether or not the suggestive queries would appear based on patterns visible in the search logs.
  5. Discovered that patterns in the search logs could be used as signals to reliably predict the future appearance of queries that indicated the searcher had received a diagnosis of pancreatic cancer.

Here is the link to download the study:

Screening for Pancreatic Adenocarcinoma Using Signals From Web Search Logs: Feasibility Study and Results

See more examples of Machine Learning in our Everyday Encounters blog series >>