Real Time Fraud Detection Using Apache Flink — Part 2

A company that needs to move currency in some form is likely to be affected by fraud. Financial Services, Gaming, E-commerce, Advertising — all of these sectors are facilitating transactions in one form or another and they are taking a hit on margins due to fraud. Understandably, this space is so vast that Fraud Detection has been one of the most widely spoken areas in ML research.
The first time I built a Fraud Detection and Prevention System, it took me 1.5 yrs to see it running in production and creating value. Yes, 1.5 yrs is way too long to invest on an ML use case. After the first release, we would spend a few months upgrading it to address new fraud patterns.
Although the value we created through those systems is in millions USD per year, many companies would fear picking up initiatives with that high time to value.
In this blog, I’m going to speak about various challenges I experienced in my journey from developing robust real-time solutions and scaling them in production. More specifically, I’ll deep dive into what it took to convince the business leaders of the value it can unlock. If you are leading Fraud Control initiatives, my recommendation is to consider addressing these in the beginning itself to avoid surprises later in your journey.
Imagine you’ve identified that 10% of your company’s transactions are fraudulent, but only 4% of those transactions actually result in financial losses. Would you still prioritize solving for the additional 6%? While it may seem like a straightforward decision to eliminate all fraud, often, solving for fraud can inadvertently conflict with your company’s core growth objectives.
Fig 3 — Fraud may not always result in losses. How to tackle such cases!
As a leader in fraud risk management, you want to see your work create tangible value. That’s your primary motivator. But growth leaders and founders are often more focused on driving topline revenue growth. This creates a natural tension across the organization: fraud teams want to minimize risk, while growth teams want to maximize opportunity. And this is applicable for all sectors.
In Banking, companies are not liable for fraud that’s perpetrated due to customer negligence, such as social engineering.
In AdTech, Networks don’t lose money if advertising partners are not asking for clawbacks.
In Gaming, companies are not always losing money if a bot is playing games.
At times, it may even be the case that companies make more revenue due to fraud.
In these cases, fraud may not appear to be an urgent problem for the company itself. However, it’s critical to recognize that fraud always results in a cost to someone — whether it’s your company or your customers. And if you’re not solving for that additional 6%, it’s your customers who are bearing the brunt of those losses.
Forward-thinking leaders understand that tackling fraud today may come with short-term costs but will ultimately strengthen customer trust and drive long-term ROI. While some leaders focus solely on reducing direct financial losses, those with foresight prioritize building sustainable trust and protecting the broader ecosystem.
Although addressing both fraud and financial losses may appear similar, they require distinct strategies. The way you analyze data, develop machine learning models, and implement decision-making processes in production is driven by your KPIs. That’s why it’s essential to know what you’re solving for and plan the roadmap accordingly.
Based on my experience of working with various types of leaders, I’ve found a phased approach to be most effective. Here’s how you can align your fraud detection efforts with your company’s North Star:
1. Prioritize Solving for Direct Losses — Start by addressing fraud that results in direct financial losses. This is the easiest win, as there’s minimal internal resistance to solving for losses that are clearly impacting the company’s bottom line. Seeing the immediate impact of these efforts builds trust and buy-in from stakeholders across the organization.
2. Conduct Controlled Experiments in Other Areas — Once you’ve addressed direct losses, gradually expand your fraud detection efforts to other areas where fraud doesn’t result in immediate financial loss but still impacts customer trust. Use controlled experiments and feedback loops to measure the impact of your solutions. The key metrics will vary by industry:
a. In Banking, focus on improving customer satisfaction and reducing churn.
b. In AdTech, measure the impact on return on ad spend (ROAS).
c. In Gaming, track user engagement and retention rates.
These experiments will help you build a case for the broader value of your fraud detection efforts.
3. Scale Solutions with Proven ROI — Once you have sufficient evidence of ROI, it becomes much easier to achieve consensus across the organization. Use the data from your experiments to demonstrate how fraud detection initiatives align with the company’s growth goals. At this stage, a full-scale rollout is typically met with little resistance, as you’ve already proven the value of your solutions.
This phased approach ensures that your fraud detection initiatives are focused on specific, measurable goals. It minimizes development time and reduces the risk of pushback when it’s time to deploy your solutions. Additionally, by aligning your initiatives with the company’s North Star, you ensure that your work creates long-term value — not just for your company, but for your customers and partners as well.
It’s recommended to have an ML and Engineering team/leadership who have experience of developing such scalable systems. This can easily save you months of effort. Otherwise, there will be many surprises in your journey that will lead to a longer time to value. However, if you don’t have the right expertise in the team, I am sharing a list of potential challenges that you may encounter while developing the solution. Being aware can minimize surprises. This is not an exhaustive but a good enough list of pitfalls that may delay your plan by months if not taken care of -
Fraud Detection is one of the biggest use cases of real-time data. Many teams start with batch features only to realize later that the reliance on stale/outdated data in batch hinders the ability to identify fraud significantly. And that’s when they have to migrate all features to real-time streaming applications. Streaming applications is a different ballgame altogether. Processing and aggregating large volumes of data in seconds and making them available for serving in Feature Store such as Redis requires deep engineering skills. Even with the right skills in a team, it’s a big effort to develop such applications and optimize performance. Not having such skills can easily add months to your timeline.
I recently worked with an Ecommerce company incurring ~4% revenue loss due to fraud. On measuring the effectiveness of features at different latencies, we realized that the same feature can miss detecting 33% of fraudulent transactions when served at latency of 2mins compared to only 2% when served at latency of 30sec. Imagine, if a feature has potential to add an incremental value of $100K in identifying fraud, you will only be able to realize an impact of $67K if the feature is served with 2mins latency. This data is specific to a set of features and the product and may vary for others.
It’s important to gather such insights before you start developing features to accelerate time to value. And have the right expertise in the team to implement real-time streaming applications.
Fraud detection systems rarely succeed without an explainability framework. Stakeholder consuming decisions made by fraud detection systems require proper reasoning which if not present adds hours to often days of work for Analysts and Data Scientists to deep-dive into data to find proper reasoning.
Before jumping into explainability, let’s first briefly understand how outcomes of fraud detection systems may be consumed by various stakeholders -
While most of the stakeholders are only interested in understanding reasons for flagging an event/account as suspicious, there are scenarios where it is equally important to justify why certain accounts, devices or users are deemed trustworthy. For e.g. in Gaming, identifying and safeguarding whale users is essential to ensure they are not mistakenly affected by the fraud detection systems. In FS and AdTech as well, customer support teams may use it to fast track resolution if the user is considered a high value user.
Here is an example of what explanation should look like -
1. The transaction has been assigned a high-risk score of 0.97 due to suspicious device activity.
2. The device has total of 11 geo-country missmatch counts across all events in the last 24hrs which is suspicious. The feature contribution is 37%.
3. Additionally, the device also has very fast click activity that resembles to that of a bot and its feature weightage is 13%.
Besides this, the explainability framework must also provide various other details around the account’s historical data and alerts for deeper investigation. While basic explainability is a must have feature, better frameworks help provide a 50–70% lift in productivity.
Machine Learning models are not perfect and so are the Business Rules used for fraud detection. Therefore, whichever method you implement, you are bound to see false positives, i.e. genuine customers getting affected due to incorrect alerts from fraud detection systems.
If you’re a Risk or Machine Learning leader, you would want to take a shot at the problem thinking that you’ll be able to develop a good ML powered fraud detection system with minimal false positives. While that’s the right approach and top teams globally are using ML, even for real-time fraud detection, it’s tough to develop models with good performance.
Imaging, your team invested 6 months in training an anomaly detection model with 5% false positives. When you take this to growth stakeholders, they may not be very happy about it as their expectation was to see a model with “great” performance. This will now either lead to weeks of conversation, alignment and pushback before deployment or your team getting back to identifying ways to reduce false positives that will take more time.
Your stakeholders or leadership may not really understand False Positives and how to tackle it. However, as Risk or Machine Learning leaders, we must understand that ML models are not perfect and that there will always be false positives. This is primarily because of following reasons -
It is important for the stakeholders to understand this fact and come up with strategies in the initial stages itself to avoid conflicts or delays later. A few things that help in this regard are -
Getting to a consensus early on ensures your team has a target to achieve that is agreed with all stakeholders and that there are no surprises later when it’s time to deploy.
Of all the challenges discussed here and beyond, False Positives is probably the most critical one to take care of beforehand.
Fraud is an evolving space. With democratization of the dark web and access to sophisticated tools fraudsters are continuously upgrading their strategies to bypass the controls. As a result, even after deploying your first fraud detection system, your Risk and Data Science/Analytics teams will need to engage in continuous enhancements to keep up with emerging threats.
However, making these enhancements can be a significant struggle due to the time-consuming nature of the process. Here’s why:
Even minor enhancements can take 4–8 weeks of effort to go from ideation to deployment. In practice, this means your team might only have 3–4 chances per year to improve fraud detection systems. However, by reducing your time to value by 50%, you could potentially double the number of production releases, allowing you to stay ahead of evolving threats.
To achieve faster time to value, it’s critical to build fraud detection systems with scalability and efficiency in mind from the outset. Investing in MLOps practices and AI platforms can significantly enhance your team’s productivity and reduce time spent on repetitive processes.
Here’s how these tools can help:
By adopting these strategies, you can make your fraud detection systems more agile, scalable, and future-proof, ensuring your organization remains one step ahead in the ongoing battle against fraud.
As Chief Risk Officers or Head of Fraud and Risk Management, balancing Compliance, Risk and Growth simultaneously is no small feat. It requires not only strategic navigation through complex business hurdles, but also a deep understanding of how to equip data, analytics, ML and platform teams with the right tools and strategies to build advanced fraud detection systems that are fast, adaptive and resilient to emerging threats.
I hope the article was helpful in gaining insights around key challenges and practical approaches to overcome, to ensure a “Faster Time to Value” in fraud detection.
I’m happy to connect and exchange thoughts if this interests you. Please drop an email on kumar.sanjog@canso.ai. Alternatively, you can also connect with me on LinkedIn here.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
Block quote
Ordered list
Unordered list
Bold text
Emphasis
Superscript
Subscript
Stay in the know and learn about the latest trends in fraud, credit, and compliance risk.