How Reddit Uses Flink Stream Joins in Its Real-Time Safety Systems
Lightning Talk
Acting on policy-violating content as quickly as possible is a top priority of Reddit’s Safety team and is accomplished through technologies such as Rule-Executor-V2 (REV2), a real-time rules-engine that processes streams of events flowing through Reddit.
While a low time-to-process latency, measured as the time it takes for some activity on the site to flow through REV2, is an important metric to optimize for, it is equally important for REV2 to be able to identify more sophisticated policy-violating content. At Reddit, we use advanced machine learning (ML) signals to detect and action such content.
In this talk, we will discuss Signals-Joiner, a Flink-based system, which enables REV2 to leverage slower-to-compute ML signals in its real-time context via stream joins. Specifically, we'll walk through the motivation behind the system, the evolution of our architecture (building a custom windowing strategy), key learnings, and the results we've achieved.
Vignesh Raja
Reddit, Inc.