FlinkAI: Building a Real-Time LLM Knowledge Engine for Apache Flink... with Flink!

Lightning Talk

How do you keep your developers effective when your internal Flink practices diverge from the open-source community? At Yahoo, we faced this challenge by building FlinkAI, a smart knowledge system that bridges the gap between our internal expertise and the global Apache Flink community.

In this session, we'll show you how we use Apache Flink itself to power a real-time streaming pipeline that ingests, processes, and understands Flink knowledge. FlinkAI consumes everything from our internal deployment guides (EKS, mTLS, Okta) to external community firehoses like mailing lists, Jiras, and commits. Using OpenAI, this data is transformed into semantic embeddings and stored in a vector database for lightning-fast natural language search.

The best part? It’s integrated directly into the Flink Web UI. FlinkAI automatically analyzes exceptions as they happen and suggests solutions in an embedded chat, turning the UI into an active troubleshooting assistant.

Come to this session to learn:

A novel architecture for a streaming-first, LLM-powered knowledge system.

How to leverage Flink to build powerful internal tooling for developers.

Practical lessons on integrating LLMs and vector databases in a real-time context.

Purshotam Shah

Yahoo