16 interconnected concepts — LLMs, RAG, Agents, MCP, A2A, Evaluation, Guardrails, and Observability — explained with live data flow animations.
Click any node to jump to that concept
A neural network trained to predict the next token over massive text corpora. Billions of parameters encode language, facts, and reasoning. Everything else builds on top of it.
Adapt a pre-trained foundation model to your specific domain by continuing training on curated task data. Reduces hallucination and improves accuracy for domain-specific use cases.
Updates all model weights. Maximum performance but requires significant GPU memory.
100% paramsInjects low-rank adapter matrices into attention layers. 100× fewer trainable params.
r=8..64 rankLoRA on a 4-bit quantized model. Fine-tune 65B on a single GPU.
4-bit quantReinforcement Learning from Human Feedback. Teaches model to follow instructions safely.
PPO / DPOBefore documents can be embedded and stored, they must be split into retrievable pieces. Chunking strategy dramatically affects retrieval quality — choose wisely.
Convert text chunks into dense numerical vectors where semantic similarity = geometric closeness. The bridge between human language and machine-searchable space.
Purpose-built databases that store and search dense embedding vectors using ANN (Approximate Nearest Neighbor) algorithms. Orders of magnitude faster than brute-force cosine search.
Hierarchical Navigable Small World graph. Best accuracy/speed tradeoff. Default in most VDBs.
Inverted File Index. Clusters vectors into Voronoi cells. Fast at massive scale (100M+).
Product Quantization. Compresses vectors 8-32× for memory efficiency at cost of accuracy.
Brute-force exact search. 100% recall but O(n) — only for small datasets (<100k).
Complex questions rarely map to a single vector search. Decompose them into focused sub-queries, rewrite ambiguities, and expand with hypothetical answers to maximize recall.
Break multi-part questions into atomic retrieval tasks, each targeting one concept.
LLM reformulates ambiguous query into clearer, more searchable form before embedding.
Hypothetical Document Embeddings — generate a fake answer, embed it, search on that vector.
Abstract the specific question to a more general version that captures broader context.
Generate N rephrasings of the query, retrieve for each, union the results.
Multi-query + Reciprocal Rank Fusion to merge and re-rank results from all sub-queries.
Different retrieval strategies suit different queries. Dense retrieval excels at semantic similarity; sparse at exact keyword matching. Hybrid combines both for maximum coverage.
Embed query + documents into the same vector space. Find nearest neighbors by cosine / dot-product similarity. Best for semantic questions.
similarity = q · d / (|q||d|)
TF-IDF variant that scores keyword overlap. Excels at exact term matching — product names, IDs, proper nouns. Doesn't understand semantics.
BM25(q,d) = Σ IDF(qi)·f(qi,d)
Combine dense + sparse scores via Reciprocal Rank Fusion or weighted sum. Best of both worlds — semantics + exact match.
RRF(d) = Σ 1/(k + rank_i)
Build a knowledge graph from entities and relations. Traverse graph edges to answer multi-hop questions that span many documents.
entity → relation → entity
A cross-encoder model re-scores retrieved chunks against the query jointly, capturing fine-grained relevance that the bi-encoder embedding model missed. Precision over recall.
Retrieval-Augmented Generation unifies all previous steps. The user query flows through decomposition → retrieval → reranking → context injection → LLM generation.
An LLM given tools and a Perceive → Plan → Act → Reflect loop. It autonomously decides which tool to call, executes it, observes the result, and iterates toward a goal.
Specialized agents collaborate under an orchestrator, parallelizing complex tasks. Each agent owns a domain and communicates structured results through a shared protocol.
Anthropic's open standard giving AI models a universal, typed interface to tools and data sources. One protocol replaces hundreds of bespoke integrations.
Read/write local files
Fetch & scrape web
SQL / NoSQL queries
Repos, PRs, issues
Read/send messages
Events & scheduling
Google's open protocol enabling agents on different frameworks to discover each other, delegate tasks, and exchange typed messages — the HTTP of the multi-agent web.
Research & Planning
Claude / LangChainCode & Execution
GPT-4 / AutoGenAgents publish capabilities at /.well-known/agent.json — auto-discoverable by peers.
Structured Task objects with typed inputs, outputs, status, and cancellation support.
Server-Sent Events for long-running tasks — progress updates flow continuously.
OAuth 2.0 and API key support — enterprise-grade access control between agents.
Claude, GPT, Gemini, LangChain, AutoGen — all interoperate via the same protocol.
Text, files, structured JSON, images — all supported as typed task output artifacts.
Systematic measurement of LLM and RAG system quality. Without rigorous evals, you're flying blind — you can't improve what you can't measure.
Input and output filters that detect and block harmful, off-topic, or policy-violating content. Guardrails sit as a protective layer around every LLM call.
Detects off-topic requests outside the system's intended domain.
Identifies and redacts SSN, credit cards, emails, phone numbers from inputs and outputs.
Detects attempts to override system prompt or hijack agent behavior.
NLI classifiers detect hate speech, violence, self-harm, CBRN content.
Validates output against retrieved context — flags unsupported claims (hallucination detection).
Industry-specific rules: HIPAA, GDPR, SOX, financial advice disclaimers.
Full-stack visibility into your AI system — traces, metrics, logs, and alerts. You can't optimize what you can't see. Observability closes the loop between deployment and improvement.
End-to-end request tracing across retrieval, LLM calls, tool use, and agent steps. OpenTelemetry compatible.
Latency P50/P95/P99, token usage, cost per query, retrieval hit rate, hallucination rate.
Structured logs of prompts, completions, retrieved chunks, tool calls, and guardrail decisions.
Real-time alerts on latency spikes, error rate increases, quality degradation, and cost anomalies.