<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Architecture on Strathweb. A free flowing tech monologue.</title>
    <link>https://www.strathweb.com/categories/architecture/</link>
    <description>Recent content in Architecture on Strathweb. A free flowing tech monologue.</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en-us</language>
    <lastBuildDate>Fri, 05 Dec 2025 08:00:00 +0000</lastBuildDate><atom:link href="https://www.strathweb.com/categories/architecture/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>SLM-default, LLM-fallback pattern with Agent Framework and Azure AI Foundry</title>
      <link>https://www.strathweb.com/2025/12/slm-default-llm-fallback-pattern-with-agent-framework-and-azure-ai-foundry/</link>
      <pubDate>Fri, 05 Dec 2025 08:00:00 +0000</pubDate>
      
      <guid>https://www.strathweb.com/2025/12/slm-default-llm-fallback-pattern-with-agent-framework-and-azure-ai-foundry/</guid>
      <description>&lt;p&gt;When building AI workflows, we often face a choice: do we use a massive, expensive cloud model for everything (to ensure best reasoning capabilities), or do we cut costs with a smaller local model (and risk hallucinations)? In this post, we&amp;rsquo;ll explore a &amp;ldquo;best of both worlds&amp;rdquo; architecture, as described in the recent survey &amp;ldquo;Small Language Models for Agentic Systems&amp;rdquo; &lt;a href=&#34;https://arxiv.org/abs/2510.03847&#34;&gt;Sharma &amp;amp; Mehta, 2025&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;We call this the &amp;ldquo;SLM-default, LLM-fallback&amp;rdquo; pattern. The premise is simple: route all queries to a fast, private, on-device Small Language Model (SLM) first. Only if that model cannot confidently answer the query, do we escalate the request to a paid cloud model (LLM).&lt;/p&gt;</description>
    </item>
    
  </channel>
</rss>
