✓ FREE FOR ALL USERS — NO SIGN-IN REQUIRED

Core AI Workflows

16 essential AI design patterns — explained with clear diagrams, real Spring Boot microservice code, and real project scenarios. Understand how each pattern works, when to use it, and what to avoid.

16AI Patterns5CategoriesSpring Boot Code🎯Real Scenarios⚠️Common Pitfalls
16 workflows
🔍

RAG — Retrieval-Augmented Generation

RetrievalIntermediateVector SearchContext InjectionGroundingpgvector

Fetches relevant documents from a knowledge base before generating an answer — grounding the LLM in real data instead of potentially hallucinated training memory.

Flow Diagram

User Query
Embed Query
Vector Search
Inject Context
LLM Answer
Grounded Output
Input Process Output
Real Scenario🏢 SupportIQAI-powered customer support for a SaaS with 10 years of knowledge
3,200 docs · 12,000 resolved tickets · 890 GitHub issues · all indexed in pgvector
Developer Question

"I'm getting a NullPointerException when the LLM returns an empty response. How do I fix this and make my AI integration production-safe?"

RAG — Retrieval-Augmented Generation — solves the core limitation of LLMs: they only know what they were trained on. Training data has a cutoff date, doesn't include your private documents, and can be factually wrong on niche topics. RAG fixes this by making the LLM "look things up" before it answers.

💡

Think of it like giving a doctor access to a live medical database before every patient consultation. Without RAG, the doctor answers from memory — potentially outdated or wrong. With RAG, the doctor reads the latest reference, then answers — grounded in real, current facts. That is exactly what happens in your Spring Boot app.

Technical Definition

In technical terms: RAG converts the user question into a vector embedding, searches a pre-indexed vector database for semantically similar documents, injects those documents into the LLM prompt as context, and lets the LLM generate an answer grounded in retrieved data — not training memory.

Why You Need It
Eliminates hallucinations on domain-specific content
Your knowledge base updates without retraining the model
Every answer is traceable to a source
Scales to millions of documents
The LLM stays the same — only your data changes
Where to Use It

Any system where the LLM needs access to private, recent, or domain-specific knowledge — customer support, internal docs Q&A, HR policy chatbots, legal research, medical reference, technical support.

THE 8 RAG TECHNIQUES — From Simple to Most Powerful:
BASIC RAG⭐⭐

The foundation. Embed the question → vector search → inject top 4 docs → LLM answers. Works for most questions. Limitation: misses exact technical terms like "NullPointerException".

HYBRID RAG⭐⭐⭐

Combines vector search (semantic meaning) + BM25 keyword search (exact terms) + metadata filtering. Score-fused with Reciprocal Rank Fusion. Best for: technical docs with specific jargon — catches both semantic meaning AND exact class names.

MULTI-QUERY RAG⭐⭐⭐

Generates 4 differently-phrased versions of the question, searches with all of them in parallel, deduplicates results. Docs found by multiple phrasings score higher. Best for: vague questions where one phrasing misses relevant docs.

HyDE RAG (Hypothetical Document Embeddings)⭐⭐⭐⭐

Asks the LLM to imagine a perfect answer first, then uses that richer hypothetical answer as the search vector. A 4-word question embeds poorly; a 200-word imagined answer embeds far better. Best for: short or vague queries.

PARENT-DOCUMENT RAG⭐⭐⭐⭐

Indexes small chunks (precise search) but returns the full parent section (rich context). Like using a book index to find a page, then reading the full chapter. Best for: long docs where a 2-sentence snippet lacks the surrounding code examples.

CONTEXTUAL COMPRESSION⭐⭐⭐⭐

After retrieving docs, asks the LLM to extract ONLY the sentences that directly answer the question. Removes noise, reduces token cost by 70%+. Best for: cost reduction and sharper, focused answers.

RE-RANKING RAG⭐⭐⭐⭐⭐

Retrieves 20 candidates (wide net), then uses a cross-encoder model (Cohere Rerank) to re-score each doc specifically against the question. Vector similarity does not equal usefulness — re-ranking finds what is actually most relevant. Best for: high-stakes answers where quality matters most.

AGENTIC RAG⭐⭐⭐⭐⭐

The LLM decides what to search, when to search again, and when it has enough to answer. Uses multiple tools: searchDocs, searchCodeExamples, searchGitHubIssues. Loops until confident. Best for: complex multi-part questions requiring several knowledge sources.

Runnable Project
SupportIQ — All 8 RAG Techniques
Clone and run all 8 RAG techniques applied to one real Spring Boot project. Compare Basic vs Hybrid vs Agentic RAG side by side.