Part-I: Proactive Risk Detection in Fintech: From Data to Real-Time Model Deployment

Ajay
5 min readOct 7, 2024

--

In the realm of financial technology, data is crucial for mitigating risks associated with money-related activities. In the financial sector, there are three major problems in the context of risk:

  1. Account takeover: In this modus operandi (MO), a malicious actor (hijacker) gains unauthorized access to a genuine user’s account, potentially compromising wallets, bank accounts, and other financial assets.
  2. User scam/merchant scam: Here, a malicious actor (scammer) convinces or forces a genuine user to perform an action that unintentionally leads to monetary loss. For example, the scammer might call the user and convince them to transfer money.
  3. Promotion abuse: In this MO, a malicious actor (abuser) exploits promotions provided by the platform or onboarded merchants. For instance, if a merchant offers a limited number of promotions, the abuser could redeem them by creating multiple accounts.

These MOs, beyond causing monetary loss to users, significantly impact user trust in the platform, which results in direct business losses. Detection, prevention, and sanctions are critical components in ensuring the safety of the platform. In this post, we will dive into the detection step.

Reactive Detection

The low-hanging fruit for detecting risk MOs is analyzing existing data based on reports, either external (social media posts, etc.) or internal (customer care tickets, etc.). This involves analyzing individual case data and defining rules to reactively mitigate risks. These rules can be applied to prevent future occurrences of the MO.

For example, if a user attempts to log in from a new device, you can implement two-factor authentication (2FA). Rules are responsive (low-latency decision-making) and reactive but have limited scope, as they are based on previously reported MOs. This means it’s difficult to discover new or undiscovered MOs, leading to a cat-and-mouse game with malicious actors. Hence, we need to shift to proactive detection.

Proactive Detection

To detect MOs as they happen, we must analyze the entire dataset, not just the reported cases. This involves using data science models that scan the full dataset, identify patterns related to MOs, and utilize labeled data — both manually and synthetically generated.

Model development using labeled data is resource-intensive, and deploying such models introduces a new set of challenges. Risk data science models operate in near real-time because decisions must be made within milliseconds or seconds. For example, when a user logs in from a new device, the system must decide whether to block access and send an OTP.

Read more about data science model development lifecycle here.

Challenges of Real-Time Model Deployment

When a data scientist develops a model, they follow a development life cycle similar to the software development lifecycle (SDLC). However, when deploying data science models in real-time, we face the following challenges:

  • Product context missing during model development
  • Data parity with model development dataset
  • Feature parity with model development features
  • Subsecond feature calculation
  • Supporting different kinds of features
  • Raw data monitoring
  • Model deployment monitoring
  • Feature monitoring
  • Model prediction label monitoring
  • Model upgrade management

In this post, we will touch base upon how to serve model in realtime.

Feature Engineering for Real-Time Models

To deploy a model with near real-time feature engineering, we must work with both streaming and batch data to generate various types of features such as:

  • Categorical features (e.g., gender)
  • Graph features (e.g., component size of an order graph)
  • Tabular features (e.g., number of orders in a day)

These features span different time periods (last minute, last month, last year) and must be generated within milliseconds.

Steps for going live with a model

  • Identify the Prediction Triggering Event: Most risk use cases require predictions in under 100 milliseconds. However, given the complexity of models and features, it’s often not feasible to achieve this within such a short window. For example, calculating the component size of last year’s customer-order graph might take longer than 100ms. One solution is to identify the closest event before the prediction-triggering event. Use this earlier event to run a model and calculate the prediction score.
  • Set a Latency Target: After identifying the closest event, measure the time difference between this and the triggering event. This becomes the latency target for end-to-end model prediction. A target of around 2 seconds is usually sufficient. If you need subsecond latency, reconsider the triggering events or prune features that consume too much time.
  • Ensure Feature Consistency: The logic used for feature calculation during the model development phase should be identical to the logic used in production. Inconsistent feature calculation can result in poor performance, meaning the model may not go live.
  • Categorize Features:

— Time-sensitive features: These include events closer to the prediction event, such as “time since last order.

— Non-time-sensitive features: These can be pre-calculated early, like “orders in the last 30 days excluding today.”

  • Work with Real-Time Data Sources: Identify data sources available in real-time, like replicas of batch tables in your data warehouse. This can be achieved by using systems like Kafka topics.
  • Deploy a Feature Engineering System: Build a system that supports real-time and batch features across multiple types, including graph, tabular, categorical, and time-series features.

Once both real-time and batch features are available, the model can be deployed and used for real-time predictions. Ensure that you log every model prediction along with relevant metadata, such as:

  • Features used
  • Event data
  • Prediction score
  • Model version
  • Event timestamp

Apart from tracking data science metrics like concept drift, it’s essential to monitor raw data, feature engineering latency, and prediction latency to ensure everything runs smoothly.

Post-Deployment Monitoring and Retraining

Once the model is live, use the prediction events to take appropriate sanctions, and continuously monitor prediction scores. Retrain the model as needed using newly reported appeal tickets, human-labeled data, or synthetic labels.

When a new and improved model is ready after retraining, deploy it to replace the current model.

Key Takeaways:

  • Proactive risk detection requires using data science models that scan the entire dataset, not just reported cases.
  • Real-time model deployment involves multiple challenges, such as data and feature parity, subsecond latency, and real-time feature engineering.
  • Continuous monitoring and retraining are essential for keeping models effective and ensuring the platform’s security.

Stay tuned for my next blog, where I will dive deeper into technical details on building real-time feature engineering system. I will cover concepts like stream processing frameworks, handling large-scale data pipelines, and optimizing feature engineering for subsecond latency.

Listen to this blog in podcast form here: https://podcasters.spotify.com/pod/show/ajay-kumar-saini7/episodes/Ep-02-Proactive-Risk-Detection-in-Fintech-From-Data-to-Real-Time-Model-Deployment-e2pds8d

--

--