This site is not optimized for Internet Explorer 9 and lower. Please choose another browser or upgrade your existing browser in order get the best experience of this website.
Case Study
How to Hunt A Cheat
Knowledge Graphs in the Multibillion-Dollar Fight Against Fraud and Noncompliance
Few entities in the world capture as much data as the United States Department of the Treasury. Amid the hundreds of millions of annual financials, hide crooks and cheats who account for billions in fraud and noncompliance.
Fighting back by manually sifting through financial reports became untenable for one of the agency’s nine bureaus. There weren’t enough agents to do the job by hand – and 5,000 were already on the case.
Leadership turned to GraphGrid’s Connected Data Platform (CDP) to automatically connect data elements and enrich them with contextual meaning. With their newly connected data, they identified and recovered $10 million lost to fraud within 30 days of going live.
To better understand the scope of this challenge, and the magnitude of recovering so much, so quickly, consider that just this one bureau annually handles over 225 million individual reports and over 68 million cases via phone or in person. Adding more agents would never provide enough help to unearth all the potential fraud and noncompliance hiding in the bureau’s records.
That proved to be a nagging problem, Treasury and its various bureaus process forms and filings throughout the fiscal year. Without automated tools for spotting subtle signs of trouble, the bureau had no choice but to make educated guesses based on each investigator’s knowledge about which investigations might pay off with a recovery.
Think of it like searching for a needle in a haystack – when the haystack is petabytes of data that includes needles of varying shapes and sizes. You might find a needle, but you’ll be hunting in a wide area and may come up with nothing. It is a tedious and time-consuming task to find them without tooling that provides an advantage. GraphGrid CDP is like having a giant magnet that quickly pulls the needles to the surface.
In this case study, we’ll review how the bureau transformed the way investigators work by connecting their data with GraphGrid CDP.
By moving the responsibility of making key connections in the data from the investigators’ heads to the collaborative knowledge graph environment, CDP improved and automated investigations while adding capacity. They create a new R&D-like process that, to this day, allows for rapidly testing fraud pattern ideas without compromising existing casework.
Disconnected records make finding fraud and noncompliance a laborious, manual affair.
Use knowledge graphs to add context to data, revealing patterns and accelerating insight.
Recover millions in unpaid funds and exponentially increase the bureau’s ability to
process cases.
Disconnected records make finding fraud and noncompliance a laborious, manual affair.
Government agencies are notorious for their standards and procedures, making sharing or studying data immensely difficult. And for a good reason: it should be difficult to compromise and share personal records, especially when those records are financial.
In the bureau’s case, the system of record for all investigations – where all financial filings, records, and reports lived, including paper records scanned into digital form – was and still is a massive data warehouse that is difficult to search directly. Uncovering fraud is a tedious process requiring humans to collect and connect clues and piece together patterns.
Unfortunately, this system has a significant flaw. By making its data hard to search, regulators had unintentionally made it easier for miscreants to hide.
Amid the vast and growing petabytes of data are billions of records dating back decades and organized in a rigid, archival structure. For the most part, agents had to know what they were looking for before starting their investigation, slowly pulling down relevant documents, one at a time, for each entity targeted for review. The challenge in this approach compounds with fraudsters frequently changing their patterns to avoid detection – making it more difficult for investigators to look in the right place.
Data gathering would consume a massive chunk of the required time and effort in a six-month investigation. For example, an agent tasked with studying the flow of money between different business entities would have to search for each relevant report, one at a time for each entity. The knowledge graph of entity connections would exist only in the investigator’s head.
The job of following the money? That would come later, usually with one or more analysts entering relevant findings in a spreadsheet. Only then would the team be able to spot patterns indicating fraud or noncompliance.
Connecting data to identify fraud patterns and options for recovery earlier in an investigation had become a priority for bureau officials.
Use knowledge graphs to add context to data, revealing patterns and accelerating insight.
At first, the bureau worked with GraphGrid-trained engineers through a systems integration partner to understand the contents of the data warehouse and which among the thousands of tables therein would be most relevant for conducting more efficient investigations.
Specifically, GraphGrid and its service partner set out to load data into a shared knowledge graph that could evolve. That way, as new types of investigations were identified and prioritized, new nodes (i.e., people, places, or
things), edges (i.e., the type of relationships, such as the leader of a related organization), labels (i.e., categories of nodes, such as an organization), and properties (i.e., context for actions taken) describing the particulars of each new investigation were added.
The initial design focused on capturing data to help solve the most urgent cases, investigations led by 100+ agents whose job was to identify criminal fraud.
Engineers responded by moving subsets of information from the most relevant tables, transferring the data into a knowledge graph that would dramatically improve agents’ ability to search and see connections.
GraphGrid CDP made it simple for the agents to start with one clue (a piece of data), quickly navigate to other clues, and visually see patterns in data. Whereas before, for example, they’d query the data warehouse directly for an exact match using a strict numerical identifier. With CDP, they can now see that one particular exec was at the center of many transactions and had relationships with known or suspected fraudsters.
The team also inserted safeguards. Instead of granting all agents full authority over the knowledge graph, engineers created compartmentalized levels of access and control. In that way, CDP mirrored the bureau’s own internal security and audit protocols to ensure confidentiality and maintain the legal due process.
The net result? Patterns that previously could only be identified by comparing documents and entering information on spreadsheets were suddenly in full view of investigating agents within a few keystrokes, according to their level of clearance.
Recover millions in unpaid funds and exponentially increase the bureau’s ability to process cases.
As the knowledge graph grew, so did its ability to serve the broader team of agents searching for instances of fraud and noncompliance. But it was the initial success that kickstarted the imagination of bureau leadership and its agents. Just 30 days after going live with GraphGrid CDP, agents identified previously undiscovered violations leading to $10 million in recovery orders.
Since then, GraphGrid CDP has expanded from serving those first 100 criminal investigators to over 5,000 of the bureau’s agents. Several use cases have emerged as a result. For example, one team has found it particularly fruitful to study how enterprises are organized globally, looking for patterns that may reveal noncompliance or fraud.
Testing new ideas for patterns of fraud has also become easier. With GraphGrid CDP, analyst teams can isolate the signals they believe may indicate new types of fraud or noncompliance and then build a temporary knowledge graph to capture the relevant contextual data. This provides additional clues and reduces the amount of time required to determine if fraud exists or not because the investigator is no longer piecing together everything in their head. Now, teams of investigators collaborate on tests lasting weeks or months. If the hypothesis proves correct, data from the temporary knowledge graph moves into the permanent graph and becomes part of the team’s ongoing caseload.
It’s a staggering change: instead of struggling to identify the highest-value cases and then applying manual resources to solve them, hoping for the best, the bureau now has an active R&D mechanism for proactively hunting for new schemes – all with GraphGrid CDP at the center. In each case, what used to take 6 months or more, now rarely takes more than two hours – keeping investigators one step ahead of fraudsters.