SLM-default, LLM-fallback pattern with Agent Framework and Azure AI Foundry

Fri, 05 Dec 2025 08:00:00 +0000

When building AI workflows, we often face a choice: do we use a massive, expensive cloud model for everything (to ensure best reasoning capabilities), or do we cut costs with a smaller local model (and risk hallucinations)? In this post, we’ll explore a “best of both worlds” architecture, as described in the recent survey “Small Language Models for Agentic Systems” Sharma & Mehta, 2025.

We call this the “SLM-default, LLM-fallback” pattern. The premise is simple: route all queries to a fast, private, on-device Small Language Model (SLM) first. Only if that model cannot confidently answer the query, do we escalate the request to a paid cloud model (LLM).

Mlx on Strathweb. A free flowing tech monologue.

SLM-default, LLM-fallback pattern with Agent Framework and Azure AI Foundry