
Meta's new structured prompting technique makes LLMs significantly better at code review — boosting accuracy to 93% in some cases
This technique can be used out-of-the-box, requiring no model training or special packaging. It is code-execution free, which means you do not need to add additional tools to your LLM environment.
Ben Dickson
IndexCache, a new sparse attention optimizer, delivers 1.82x faster inference on long-context AI models
The technique works by detecting that adjacent model layers repeat the same token selections — then caching the result instead of recalculating.
Ben Dickson
How xMemory cuts token costs and context bloat in AI agents
New research technique xMemory cuts token usage nearly in half for multi-session AI agents by replacing flat RAG with a four-level semantic hierarchy.
Ben Dickson
Three ways AI is learning to understand the physical world
LLMs can't reason about physics. World models might — and three distinct architectural approaches are competing to fill that gap.
Ben Dickson
Nvidia says it can shrink LLM memory 20x without changing model weights
Nvidia's KVTC compresses LLM memory 20x without model changes, cutting latency 8x for coding assistants and agentic workloads.
Ben Dickson
Google finds that AI agents learn to cooperate when trained against unpredictable opponents
Google finds diverse opponent training beats hardcoded orchestration for getting AI agents to cooperate in enterprise deployments.
Ben Dickson
New KV cache compaction technique cuts LLM memory 50x without accuracy loss
Enterprise AI hits a memory ceiling with long documents and complex tasks. MIT's new Attention Matching compresses the KV cache by 50x without accuracy loss — in seconds, not hours.
Ben Dickson
Microsoft's new AI training method eliminates bloated system prompts without sacrificing model performance
Microsoft's new OPCD framework trains AI models to internalize long system prompts directly into their weights, cutting inference overhead without losing general capability.
Ben Dickson
Researchers baked 3x inference speedups directly into LLM weights — without speculative decoding
This training technique triples LLM inference speed without auxiliary models or infrastructure changes — using just a single special token added to the model's existing architecture.
Ben Dickson
New agent framework matches human-engineered AI systems — and adds zero inference cost to deploy
A new group-evolving agent framework from UC Santa Barbara matches human-engineered AI systems on SWE-bench — and adds zero inference cost to deploy. Here's how it works.
Ben Dickson
Nvidia’s new technique cuts LLM reasoning costs by 8x without losing accuracy
Nvidia researchers developed dynamic memory sparsification (DMS), a technique that compresses the KV cache in large language models by up to 8x while maintaining reasoning accuracy — and it can be retrofitted onto existing models in hours.
Ben Dickson
MIT's new fine-tuning method lets LLMs learn new skills without losing old ones
MIT researchers unveil a new fine-tuning method that lets enterprises consolidate their "model zoos" into a single, continuously learning agent.
Ben Dickson