VentureBeat | Transformative tech coverage that matters

Featured

DeepSWE blows up the AI coding leaderboard, crowns GPT-5.5, and finds Claude Opus exploiting a benchmark loophole

For months, the leading AI coding benchmarks have told enterprise buyers a comforting but misleading story: the top models are all roughly the same. OpenAI's GPT-5 family, Anthropic's Claude Opus, and Google's Gemini Pro have clustered within a narrow band on Scale AI's SWE-Bench Pro leaderboard, making it nearly impossible for engineering leaders to determine which agent will actually perform best inside their codebases.

Michael Nuñez

May 26, 2026

Why prompt debt, retrieval debt, and evaluation debt are quietly reshaping enterprise AI risk

Over the past two decades, technical debt meant outdated architecture, messy code, and poorly maintained documentation. That definition is no longer sufficient in the AI era, where failure modes are more subtle and often non-linear. AI systems are introducing new layers of technical debt that live across prompts, models, and data dependencies — making these layers less visible, harder to measure, and often more dangerous than traditional debt.

Vikram Venkat

May 25, 2026

Capybara with glasses typing on laptop while piloting mecha — Credit: VentureBeat made with OpenAI ChatGPT-2.0-Images

Alibaba's proprietary Qwen3.7-Max can run for 35 hours autonomously and supports external harnesses like Anthropic's Claude Code

On the Apex Math Reasoning benchmark, Qwen3.7-Max scored 44.5, eclipsing Claude Opus-4.6 Max's score of 34.5 and DeepSeek V4-Pro Max's 38.3.

Carl Franzen

May 21, 2026

Nuneybits Vector art of a single neon yellow-green AI eye open e84a55ca-5cf2-49ff-9486-757f960ca6dc — Credit: VentureBeat made with Midjourney

Resolve AI says the AI coding boom is breaking production systems. It wants to fix that.

The centerpiece of the release is a new multi-agent investigation system developed by Resolve AI's in-house research lab. Instead of deploying a single AI agent to diagnose a production failure — analogous to a lone engineer pulling an on-call shift — the platform now dispatches a coordinated team of specialized agents that pursue multiple hypotheses in parallel, independently verify each other's conclusions, and construct complete causal chains from root cause to symptom. The company says the architecture delivers more than a twofold improvement in root cause accuracy on its internal evaluation benchmarks compared to earlier versions of its platform.

Michael Nuñez

May 21, 2026

Subscribe to get latest news!

Deep insights for enterprise AI, data, and security leaders

Nuneybits Vector art of robot agents in society d92ca346-3082-41e3-a601-5f3b18018036 — Credit: VentureBeat made with Midjourney

Kore.ai launches Artemis AI agent platform, takes on Salesforce and ServiceNow

The platform arrives at a moment when every major technology vendor — from Microsoft and Salesforce to Google and ServiceNow — is racing to become the default infrastructure for enterprise AI agents. Kore.ai's answer to that crowded field is a bet on neutrality, a proprietary intermediary language for defining agents, and a philosophy that AI, not human developers, should do most of the heavy lifting.

Michael Nuñez

May 21, 2026

Partner Content

AI didn’t kill brand consistency — it made it mission-critical

Presented by Design.com

VB Staff

May 21, 2026

POV viewing Cohere Command A+ report card in computer lab — Credit: VentureBeat made with OpenAI ChatGPT-Images-2.0

Cohere cracks lossless quantization and native citations with first full Apache 2.0 licensed open model Command A+

Using special tags embedded in the output, the model directly links every factual claim it makes to the specific source document or database row it pulled the information from.

Carl Franzen

May 20, 2026

Nuneybits Vector art of cobalt chip towering servers in burnt o e4e68375-d5c6-4559-87a7-d92ffb2bf67a-1 — Credit: VentureBeat made with Midjourney

Cerebras says its chips run a trillion-parameter AI model nearly 7 times faster than GPU clouds

Less than a week after completing the largest tech IPO of 2026, Cerebras Systems is making its most aggressive play yet to dominate the fast-growing AI inference market. On Monday, the Sunnyvale-based chipmaker announced that it is now running Kimi K2.6 — a trillion-parameter open-weight model developed by Beijing-based Moonshot AI — for enterprise customers at nearly 1,000 tokens per second, a speed no GPU-based provider has come close to matching.

Michael Nuñez

May 20, 2026

Doctor treating patient with scribe app — Credit: VentureBeat made with OpenAI ChatGPT-Images-2.0

Corti's new Symphony for Speech-to-Text model beats OpenAI at medical terminology accuracy, highlighting the value of specialized AI

Today, Copenhagen-based healthcare AI Corti is launching Symphony for Speech-to-Text, a new generation of clinical-grade speech recognition models engineered specifically for real-time dictation, conversational transcription, and batch audio processing — and their accuracy rate is the highest for this specific use case yet recorded.

Carl Franzen

May 20, 2026

Nuneybits Vector art of classic 1990s computer workstation CRT 80a86e51-59a6-4978-9b1f-e70f2a5d55d4 — Credit: VentureBeat made with Midjourney

Google says Gemini 3.5 Flash can slash enterprise AI costs by more than $1 billion a year

Google unveiled Gemini 3.5 Flash at its annual I/O developer conference on Tuesday, a new artificial intelligence model that the company says shatters what had become a seemingly iron law of the AI industry: that the smartest models must also be the slowest and most expensive to run.

Michael Nuñez

May 19, 2026

Nuneybits Vector art of blinking cursor morphing into multicolo bea82388-2822-4aa6-8bf0-f1936ec7ef2a — Credit: VentureBeat made with Midjourney

Google just redesigned the search box for the first time in 25 years — here’s why it matters more than you think.

For a quarter century, the Google search box has been one of the most recognizable interfaces in computing: a thin white rectangle, a blinking cursor, a few typed words, and a list of blue links. On Tuesday, Google will formally retire that paradigm.

Michael Nuñez

May 19, 2026

Nuneybits Vector art of blinking cursor morphing into multicolo 0bc0a921-6033-4931-8bb8-73bf5877e798 — Credit: VentureBeat made with Midjourney

Google’s new AI agent can draft your emails, monitor your inbox and eventually spend your money

Google on Tuesday unveiled Gemini Spark, a personal AI agent designed to work around the clock — drafting emails, assembling documents, monitoring inboxes, and eventually making purchases — even when a user's laptop is closed and their phone is locked.

Michael Nuñez

May 19, 2026