The Voxli blog

Field notes on testing AI agents

AI Agent Testing AI Agents Conversational AI Agent Reliability AI Quality Assurance Model Behavior Support Agent Failure Modes Case Study Customer Story How-to-guide LLM Testing Reasoning Models

Latest Agent Reliability Jun 17, 2026

A better model can make your agent worse

A stronger model with higher scores looks like a free upgrade, until the agent that worked last week starts getting things wrong, quietly. Here is what happened when we ran one agent on two frontier models and changed nothing else.

Voxli Read article →

Failure Modes May 26, 2026

Upfront information dump

A customer opens your support agent with this:

Mahey Qadir

May 15, 2026

Mid-conversation tangent

A customer is halfway through a return flow with your agent. They've shared the order number, the item and reason for the return. They then pause to ask: "Wait, do you offer…

Voxli

Agent Reliability Apr 27, 2026

The multi-turn failures that prompt evals can't see

Most agent failures we see in pilots don't show up on prompt evals.

Voxli

Apr 21, 2026

The 10-minute test that stops your agent from canceling real orders

Last week a failed tool call caused GPT-5.4-mini to cancel a real order simply because a customer asked a question involving cancellation. Here's a quick test that catches it.

Voxli

Case Study Apr 16, 2026

Expertise.ai teams up with Voxli to solve the "absolute insanity" of their AI sales Agent testing workflow

Expertise.ai is a known disruptor in the AI space, building AI sales agents that guide prospects through personalized flows. Here's how Voxli untangled their testing workflow.

Mahey Qadir

AI Agent Testing Apr 14, 2026

The failed Tool Call when Simulating a Customer Conversation Across Three LLMs

Recently, to assess AI Agent performance with tool calls, we executed the same multi-turn conversation across the three tiers of OpenAI's GPT-5.4: standard, mini, and nano.

Mahey Qadir

AI Agent Testing Apr 2, 2026

Testing for Speculation using Voxli

In our last post we covered the risks of agent speculation. Today we look at how to set up Voxli to catch those speculations — using a feature called Hallucination detection.

Mahey Qadir

AI Agents Mar 27, 2026

The Risks of Agent Speculation

It's no surprise that hallucinations are a common known failure during agentic AI testing. The agent starts to overpromise, begins to fabricate answers and even claims that it…

Voxli