✓ FREE FOR ALL USERS — NO SIGN-IN REQUIRED

Core AI Workflows

16 essential AI design patterns — explained with clear diagrams, real Spring Boot microservice code, and real project scenarios. Understand how each pattern works, when to use it, and what to avoid.

16AI Patterns5Categories☕Spring Boot Code🎯Real Scenarios⚠️Common Pitfalls

16 workflows

🔍

RAG — Retrieval-Augmented Generation

RetrievalIntermediateVector SearchContext InjectionGroundingpgvector

Fetches relevant documents from a knowledge base before generating an answer — grounding the LLM in real data instead of potentially hallucinated training memory.

Flow Diagram

User Query

→

Embed Query

→

Vector Search

→

Inject Context

→

LLM Answer

→

Grounded Output

■ Input■ Process■ Output

Real Scenario🏢 SupportIQ— AI-powered customer support for a SaaS with 10 years of knowledge

3,200 docs · 12,000 resolved tickets · 890 GitHub issues · all indexed in pgvector

Developer Question

"I'm getting a NullPointerException when the LLM returns an empty response. How do I fix this and make my AI integration production-safe?"

RAG — Retrieval-Augmented Generation — solves the core limitation of LLMs: they only know what they were trained on. Training data has a cutoff date, doesn't include your private documents, and can be factually wrong on niche topics. RAG fixes this by making the LLM "look things up" before it answers.

💡

Think of it like giving a doctor access to a live medical database before every patient consultation. Without RAG, the doctor answers from memory — potentially outdated or wrong. With RAG, the doctor reads the latest reference, then answers — grounded in real, current facts. That is exactly what happens in your Spring Boot app.

Technical Definition

In technical terms: RAG converts the user question into a vector embedding, searches a pre-indexed vector database for semantically similar documents, injects those documents into the LLM prompt as context, and lets the LLM generate an answer grounded in retrieved data — not training memory.

Why You Need It

✓Eliminates hallucinations on domain-specific content

✓Your knowledge base updates without retraining the model

✓Every answer is traceable to a source

✓Scales to millions of documents

✓The LLM stays the same — only your data changes

Where to Use It

Any system where the LLM needs access to private, recent, or domain-specific knowledge — customer support, internal docs Q&A, HR policy chatbots, legal research, medical reference, technical support.

THE 8 RAG TECHNIQUES — From Simple to Most Powerful:

①

BASIC RAG⭐⭐

The foundation. Embed the question → vector search → inject top 4 docs → LLM answers. Works for most questions. Limitation: misses exact technical terms like "NullPointerException".

②

HYBRID RAG⭐⭐⭐

Combines vector search (semantic meaning) + BM25 keyword search (exact terms) + metadata filtering. Score-fused with Reciprocal Rank Fusion. Best for: technical docs with specific jargon — catches both semantic meaning AND exact class names.

③

MULTI-QUERY RAG⭐⭐⭐

Generates 4 differently-phrased versions of the question, searches with all of them in parallel, deduplicates results. Docs found by multiple phrasings score higher. Best for: vague questions where one phrasing misses relevant docs.

④

HyDE RAG (Hypothetical Document Embeddings)⭐⭐⭐⭐

Asks the LLM to imagine a perfect answer first, then uses that richer hypothetical answer as the search vector. A 4-word question embeds poorly; a 200-word imagined answer embeds far better. Best for: short or vague queries.

⑤

PARENT-DOCUMENT RAG⭐⭐⭐⭐

Indexes small chunks (precise search) but returns the full parent section (rich context). Like using a book index to find a page, then reading the full chapter. Best for: long docs where a 2-sentence snippet lacks the surrounding code examples.

⑥

CONTEXTUAL COMPRESSION⭐⭐⭐⭐

After retrieving docs, asks the LLM to extract ONLY the sentences that directly answer the question. Removes noise, reduces token cost by 70%+. Best for: cost reduction and sharper, focused answers.

⑦

RE-RANKING RAG⭐⭐⭐⭐⭐

Retrieves 20 candidates (wide net), then uses a cross-encoder model (Cohere Rerank) to re-score each doc specifically against the question. Vector similarity does not equal usefulness — re-ranking finds what is actually most relevant. Best for: high-stakes answers where quality matters most.

⑧

AGENTIC RAG⭐⭐⭐⭐⭐

The LLM decides what to search, when to search again, and when it has enough to answer. Uses multiple tools: searchDocs, searchCodeExamples, searchGitHubIssues. Loops until confident. Best for: complex multi-part questions requiring several knowledge sources.

Runnable Project

SupportIQ — All 8 RAG Techniques

Clone and run all 8 RAG techniques applied to one real Spring Boot project. Compare Basic vs Hybrid vs Agentic RAG side by side.

Clone & Run →⭐ Star to support 🍴 Fork it