<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Mlx on Strathweb. A free flowing tech monologue.</title>
    <link>https://www.strathweb.com/categories/mlx/</link>
    <description>Recent content in Mlx on Strathweb. A free flowing tech monologue.</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en-us</language>
    <lastBuildDate>Thu, 21 May 2026 07:06:14 +0000</lastBuildDate><atom:link href="https://www.strathweb.com/categories/mlx/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Chain of Agents: Collaboration between local and remote language models with Agent Framework</title>
      <link>https://www.strathweb.com/2026/05/chain-of-agents-collaboration-between-local-and-remote-languague-models-with-agent-framework/</link>
      <pubDate>Thu, 21 May 2026 07:06:14 +0000</pubDate>
      
      <guid>https://www.strathweb.com/2026/05/chain-of-agents-collaboration-between-local-and-remote-languague-models-with-agent-framework/</guid>
      <description>&lt;p&gt;This post continues a series on hybrid architectures that combine local Small Language Models (SLMs) with cloud Large Language Models (LLMs). We have already looked at the &lt;a href=&#34;https://www.strathweb.com/2025/10/llm-and-slm-collaboration-using-the-minions-pattern&#34;&gt;Minions pattern&lt;/a&gt;, which offloads bulk text extraction to a local model to reduce cloud API costs, and the &lt;a href=&#34;https://www.strathweb.com/2025/12/slm-default-llm-fallback-pattern-with-agent-framework-and-azure-ai-foundry&#34;&gt;SLM-default, LLM-fallback pattern&lt;/a&gt;, where a local model handles the majority of queries and only escalates to the cloud when confidence is low. Today we look at a third pattern: Chain of Agents (CoA), introduced in the paper &lt;a href=&#34;https://arxiv.org/abs/2406.02818&#34;&gt;&amp;ldquo;Chain of Agents: Large Language Models Collaborating on Long-Context Tasks&amp;rdquo;&lt;/a&gt; by Wang et al. (2024). Where the previous patterns were primarily motivated by cost, CoA is motivated by a different challenge - what happens when your document is simply too long to fit into any model&amp;rsquo;s context window?&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Introducing the MLX Integration Library for Agent Framework</title>
      <link>https://www.strathweb.com/2025/12/introducing-mlx-integration-library-for-agent-framework/</link>
      <pubDate>Thu, 11 Dec 2025 08:00:00 +0000</pubDate>
      
      <guid>https://www.strathweb.com/2025/12/introducing-mlx-integration-library-for-agent-framework/</guid>
      <description>&lt;p&gt;I&amp;rsquo;ve recently been working on setting up a bunch of &lt;a href=&#34;https://github.com/microsoft/agent-framework&#34;&gt;Agent Framework&lt;/a&gt; samples, which would showcase the cooperation between cloud agents (backed by LLMs in the cloud) and local agents (running on your own machine). Since I primarily work on a Mac, the natural choice for me was to use &lt;a href=&#34;https://www.strathweb.com/categories/mlx/&#34;&gt;MLX&lt;/a&gt; as the local model runner, which required a bit of bootstrapping - and felt quite tedious. So, the natural next step was to create a library that would make it easy to integrate MLX models into Agent Framework applications, since there wasn&amp;rsquo;t one available yet.&lt;/p&gt;
&lt;p&gt;Today, I&amp;rsquo;m excited to announce the release of the MLX Integration Library for Agent Framework! This library simplifies the process of integrating MLX models into your Agent Framework applications, allowing you to leverage local Mac AI capabilities seamlessly alongside cloud-based agents.&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>SLM-default, LLM-fallback pattern with Agent Framework and Azure AI Foundry</title>
      <link>https://www.strathweb.com/2025/12/slm-default-llm-fallback-pattern-with-agent-framework-and-azure-ai-foundry/</link>
      <pubDate>Fri, 05 Dec 2025 08:00:00 +0000</pubDate>
      
      <guid>https://www.strathweb.com/2025/12/slm-default-llm-fallback-pattern-with-agent-framework-and-azure-ai-foundry/</guid>
      <description>&lt;p&gt;When building AI workflows, we often face a choice: do we use a massive, expensive cloud model for everything (to ensure best reasoning capabilities), or do we cut costs with a smaller local model (and risk hallucinations)? In this post, we&amp;rsquo;ll explore a &amp;ldquo;best of both worlds&amp;rdquo; architecture, as described in the recent survey &amp;ldquo;Small Language Models for Agentic Systems&amp;rdquo; &lt;a href=&#34;https://arxiv.org/abs/2510.03847&#34;&gt;Sharma &amp;amp; Mehta, 2025&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;We call this the &amp;ldquo;SLM-default, LLM-fallback&amp;rdquo; pattern. The premise is simple: route all queries to a fast, private, on-device Small Language Model (SLM) first. Only if that model cannot confidently answer the query, do we escalate the request to a paid cloud model (LLM).&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>LLM and SLM collaboration using the Minions pattern (with Phi-4-mini and Azure OpenAI)</title>
      <link>https://www.strathweb.com/2025/10/llm-and-slm-collaboration-using-the-minions-pattern/</link>
      <pubDate>Fri, 24 Oct 2025 07:06:14 +0000</pubDate>
      
      <guid>https://www.strathweb.com/2025/10/llm-and-slm-collaboration-using-the-minions-pattern/</guid>
      <description>&lt;p&gt;In this post, we&amp;rsquo;ll explore a novel approach to optimizing AI workflows by strategically combining large language models (LLMs) with small language models (SLMs) using the &amp;ldquo;Minions pattern.&amp;rdquo; This technique, described in the research paper &lt;a href=&#34;https://arxiv.org/abs/2502.15964&#34;&gt;&amp;ldquo;Minions: Cost-efficient Collaboration Between On-device and Cloud Language Models&amp;rdquo;&lt;/a&gt; by Narayan et al., addresses one of the most pressing challenges in AI application development - the cost of processing large amounts of data with expensive, cloud-based language models. If you&amp;rsquo;ve ever built an AI system that needs to analyze extensive documents or datasets, you&amp;rsquo;ve probably felt the frustration of watching your API costs skyrocket as you process more and more content.&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Running Phi models on iOS with Apple MLX Framework</title>
      <link>https://www.strathweb.com/2025/03/running-phi-models-on-ios-with-apple-mlx-framework/</link>
      <pubDate>Mon, 10 Mar 2025 08:30:12 +0000</pubDate>
      
      <guid>https://www.strathweb.com/2025/03/running-phi-models-on-ios-with-apple-mlx-framework/</guid>
      <description>&lt;p&gt;As I previously blogged a few times, I have been working on the &lt;a href=&#34;https://www.strathweb.com/2024/07/announcing-strathweb-phi-engine-a-cross-platform-library-for-running-phi-3-anywhere/&#34;&gt;Strathweb Phi Engine&lt;/a&gt;, a cross-platform library for running Phi model inference via a simple, high-level API, from a number of high-level languages: C#, Swift, Kotlin and Python. This of course includes the capability of running Phi models on iOS devices, and the sample repo contains a &lt;a href=&#34;https://github.com/filipw/strathweb-phi-engine/tree/main/samples/ios/phi.engine.sample&#34;&gt;demo SwiftUI application&lt;/a&gt; that demonstrates how to do this.&lt;/p&gt;
&lt;p&gt;Today I wanted to show an alternative way of running Phi models on iOS devices, using Apple&amp;rsquo;s &lt;a href=&#34;https://opensource.apple.com/projects/mlx/&#34;&gt;MLX framework&lt;/a&gt;. I previously &lt;a href=&#34;https://www.strathweb.com/2025/01/fine-tuning-phi-models-with-mlx&#34;&gt;blogged&lt;/a&gt; about fine-tuning Phi models on iOS using MLX, so that post is a good read if you want to learn more about the MLX framework and how to use it.&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Fine tuning Phi models with MLX</title>
      <link>https://www.strathweb.com/2025/01/fine-tuning-phi-models-with-mlx/</link>
      <pubDate>Fri, 17 Jan 2025 07:06:14 +0000</pubDate>
      
      <guid>https://www.strathweb.com/2025/01/fine-tuning-phi-models-with-mlx/</guid>
      <description>&lt;p&gt;Recently, I dedicated quite a lot of room &lt;a href=&#34;https://www.strathweb.com/categories/phi/&#34;&gt;on this blog&lt;/a&gt; to the topic of running Phi locally with the &lt;a href=&#34;https://github.com/filipw/strathweb-phi-engine&#34;&gt;Strathweb Phi Engine&lt;/a&gt;. This time, I want to focus on a different aspect of adopting small language models like Phi - fine-tuning them. We are going to do this with Apple&amp;rsquo;s &lt;a href=&#34;https://opensource.apple.com/projects/mlx/&#34;&gt;MLX&lt;/a&gt; library, which offers excellent performance for ML-related tasks on Apple Silicon.&lt;/p&gt;
&lt;p&gt;We are going to do &lt;a href=&#34;https://huggingface.co/docs/peft/main/en/conceptual_guides/lora&#34;&gt;LoRA&lt;/a&gt; fine tuning of a Phi model, and then invoke it using Strathweb Phi Engine.&lt;/p&gt;</description>
    </item>
    
  </channel>
</rss>
