Three frontier topics that define modern LLM systems — what fits in the window, why models lie, and how compute at inference time changed everything.
The context window is the single most precious resource in an LLM system. Context engineering is the discipline of deciding what to put in, what to compress, what to cache, and what to leave out — because everything you add costs latency, money, and attention.
LLMs don't have a mechanism for distinguishing what they know from what they're plausibly generating — every answer is a fluent guess. Grounding ties generation to verifiable sources, so the model is forced to reference reality instead of inventing it.
Fabricated facts that contradict reality — wrong dates, invented names, fictional citations, made-up statistics.
Output contradicts or extends beyond the source documents provided in context — most common failure mode in RAG systems.
Internally inconsistent reasoning — the model contradicts a claim it just made, or its conclusion doesn't follow from premises.
Tool calls or structured outputs that violate the expected format — invented field names, wrong types, malformed JSON.
For years, LLM capability scaled with training compute. Reasoning models flipped that — they spend more compute at inference time, generating long internal chains of thought, exploring branches, and verifying their own work before producing the final answer.
Capability is baked in during pretraining. Inference is a one-shot forward pass. More tokens at inference doesn't help — the model already "knows" or it doesn't.
Models trained to reason can substantially improve answer quality by spending more tokens thinking. Capability is now a knob you turn at inference time.
Prompt the model to reason step-by-step before answering. Free, simple, surprisingly effective on math and logic tasks. The original test-time technique.
Generate N different reasoning paths at non-zero temperature, then vote on the most common answer. Trades linear cost for substantial accuracy gains on hard problems.
Explore multiple branches of reasoning, evaluate intermediate states, and prune dead ends. Search through reasoning space instead of committing to one chain.
Models RL-trained to produce extended thinking traces natively. The model decides on its own when to think more, backtrack, verify, or commit to an answer.
Generate N candidate answers, score each with a learned reward model or verifier, return the highest-scoring one. Effective when verification is easier than generation.
Score not just the final answer but each intermediate reasoning step. Allows beam-search-style exploration where the model commits to good steps and abandons bad ones.
The frontier of production LLM systems — what to put in, what to trust, and how hard to think.