<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Ai on Strathweb. A free flowing tech monologue.</title>
    <link>https://www.strathweb.com/categories/ai/</link>
    <description>Recent content in Ai on Strathweb. A free flowing tech monologue.</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en-us</language>
    <lastBuildDate>Thu, 21 May 2026 07:06:14 +0000</lastBuildDate><atom:link href="https://www.strathweb.com/categories/ai/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Chain of Agents: Collaboration between local and remote language models with Agent Framework</title>
      <link>https://www.strathweb.com/2026/05/chain-of-agents-collaboration-between-local-and-remote-languague-models-with-agent-framework/</link>
      <pubDate>Thu, 21 May 2026 07:06:14 +0000</pubDate>
      
      <guid>https://www.strathweb.com/2026/05/chain-of-agents-collaboration-between-local-and-remote-languague-models-with-agent-framework/</guid>
      <description>&lt;p&gt;This post continues a series on hybrid architectures that combine local Small Language Models (SLMs) with cloud Large Language Models (LLMs). We have already looked at the &lt;a href=&#34;https://www.strathweb.com/2025/10/llm-and-slm-collaboration-using-the-minions-pattern&#34;&gt;Minions pattern&lt;/a&gt;, which offloads bulk text extraction to a local model to reduce cloud API costs, and the &lt;a href=&#34;https://www.strathweb.com/2025/12/slm-default-llm-fallback-pattern-with-agent-framework-and-azure-ai-foundry&#34;&gt;SLM-default, LLM-fallback pattern&lt;/a&gt;, where a local model handles the majority of queries and only escalates to the cloud when confidence is low. Today we look at a third pattern: Chain of Agents (CoA), introduced in the paper &lt;a href=&#34;https://arxiv.org/abs/2406.02818&#34;&gt;&amp;ldquo;Chain of Agents: Large Language Models Collaborating on Long-Context Tasks&amp;rdquo;&lt;/a&gt; by Wang et al. (2024). Where the previous patterns were primarily motivated by cost, CoA is motivated by a different challenge - what happens when your document is simply too long to fit into any model&amp;rsquo;s context window?&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Introducing AgentGuard - declarative guardrails for .NET AI agents</title>
      <link>https://www.strathweb.com/2026/03/introducing-agentguard-declarative-guardrails-for-dotnet-ai-agents/</link>
      <pubDate>Tue, 24 Mar 2026 07:00:00 +0000</pubDate>
      
      <guid>https://www.strathweb.com/2026/03/introducing-agentguard-declarative-guardrails-for-dotnet-ai-agents/</guid>
      <description>&lt;p&gt;As AI agents become more common in .NET applications, the question of how to keep them safe and well-behaved keeps coming up. Prompt injection, PII leakage, topic drift, tool call abuse - these are all problems that every team building with agents ends up having to deal with, often by hand-rolling ad-hoc checks. Python developers have had libraries like &lt;a href=&#34;https://github.com/NVIDIA/NeMo-Guardrails&#34;&gt;NeMo Guardrails&lt;/a&gt; and &lt;a href=&#34;https://github.com/guardrails-ai/guardrails&#34;&gt;Guardrails AI&lt;/a&gt; to help with this for a while now, but the .NET side has been largely left to fend for itself.&lt;/p&gt;
&lt;p&gt;Today I would like to introduce &lt;a href=&#34;https://filipw.github.io/AgentGuard&#34;&gt;AgentGuard&lt;/a&gt;, a library I have been working on to fill that gap - composable, declarative guardrails and safety controls for .NET AI agents.&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Fine-tuning Phi-4 with Azure ML</title>
      <link>https://www.strathweb.com/2026/02/fine-tuning-phi-4-with-azure-ml/</link>
      <pubDate>Mon, 23 Feb 2026 07:06:14 +0000</pubDate>
      
      <guid>https://www.strathweb.com/2026/02/fine-tuning-phi-4-with-azure-ml/</guid>
      <description>&lt;p&gt;Recently, I dedicated quite a lot of room &lt;a href=&#34;https://www.strathweb.com/categories/phi&#34;&gt;on this blog&lt;/a&gt; to the topic of running Phi locally. This time, I want to focus on a different aspect of adopting small language models like Phi - fine-tuning them. I already covered &lt;a href=&#34;https://www.strathweb.com/2025/01/fine-tuning-phi-models-with-mlx&#34;&gt;local fine-tuning in the past&lt;/a&gt;, so today we are going to do this with &lt;a href=&#34;https://learn.microsoft.com/en-us/azure/machine-learning/overview-what-is-azure-machine-learning?view=azureml-api-2&#34;&gt;Azure Machine Learning (Azure ML)&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Azure ML is a comprehensive cloud service for accelerating and managing the machine learning project lifecycle. While local fine-tuning is great, moving to Azure ML makes a lot of sense when you need to scale, and/or when you want to experience the Nvidia GPUs without investing in hardware.&lt;/p&gt;
&lt;p&gt;We are going to do &lt;a href=&#34;https://arxiv.org/abs/2106.09685&#34;&gt;LoRA&lt;/a&gt; fine-tuning of a Phi-4 model, and then deploy it to a managed batch endpoint for inference.&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Introducing the MLX Integration Library for Agent Framework</title>
      <link>https://www.strathweb.com/2025/12/introducing-mlx-integration-library-for-agent-framework/</link>
      <pubDate>Thu, 11 Dec 2025 08:00:00 +0000</pubDate>
      
      <guid>https://www.strathweb.com/2025/12/introducing-mlx-integration-library-for-agent-framework/</guid>
      <description>&lt;p&gt;I&amp;rsquo;ve recently been working on setting up a bunch of &lt;a href=&#34;https://github.com/microsoft/agent-framework&#34;&gt;Agent Framework&lt;/a&gt; samples, which would showcase the cooperation between cloud agents (backed by LLMs in the cloud) and local agents (running on your own machine). Since I primarily work on a Mac, the natural choice for me was to use &lt;a href=&#34;https://www.strathweb.com/categories/mlx/&#34;&gt;MLX&lt;/a&gt; as the local model runner, which required a bit of bootstrapping - and felt quite tedious. So, the natural next step was to create a library that would make it easy to integrate MLX models into Agent Framework applications, since there wasn&amp;rsquo;t one available yet.&lt;/p&gt;
&lt;p&gt;Today, I&amp;rsquo;m excited to announce the release of the MLX Integration Library for Agent Framework! This library simplifies the process of integrating MLX models into your Agent Framework applications, allowing you to leverage local Mac AI capabilities seamlessly alongside cloud-based agents.&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>SLM-default, LLM-fallback pattern with Agent Framework and Azure AI Foundry</title>
      <link>https://www.strathweb.com/2025/12/slm-default-llm-fallback-pattern-with-agent-framework-and-azure-ai-foundry/</link>
      <pubDate>Fri, 05 Dec 2025 08:00:00 +0000</pubDate>
      
      <guid>https://www.strathweb.com/2025/12/slm-default-llm-fallback-pattern-with-agent-framework-and-azure-ai-foundry/</guid>
      <description>&lt;p&gt;When building AI workflows, we often face a choice: do we use a massive, expensive cloud model for everything (to ensure best reasoning capabilities), or do we cut costs with a smaller local model (and risk hallucinations)? In this post, we&amp;rsquo;ll explore a &amp;ldquo;best of both worlds&amp;rdquo; architecture, as described in the recent survey &amp;ldquo;Small Language Models for Agentic Systems&amp;rdquo; &lt;a href=&#34;https://arxiv.org/abs/2510.03847&#34;&gt;Sharma &amp;amp; Mehta, 2025&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;We call this the &amp;ldquo;SLM-default, LLM-fallback&amp;rdquo; pattern. The premise is simple: route all queries to a fast, private, on-device Small Language Model (SLM) first. Only if that model cannot confidently answer the query, do we escalate the request to a paid cloud model (LLM).&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>LLM and SLM collaboration using the Minions pattern (with Phi-4-mini and Azure OpenAI)</title>
      <link>https://www.strathweb.com/2025/10/llm-and-slm-collaboration-using-the-minions-pattern/</link>
      <pubDate>Fri, 24 Oct 2025 07:06:14 +0000</pubDate>
      
      <guid>https://www.strathweb.com/2025/10/llm-and-slm-collaboration-using-the-minions-pattern/</guid>
      <description>&lt;p&gt;In this post, we&amp;rsquo;ll explore a novel approach to optimizing AI workflows by strategically combining large language models (LLMs) with small language models (SLMs) using the &amp;ldquo;Minions pattern.&amp;rdquo; This technique, described in the research paper &lt;a href=&#34;https://arxiv.org/abs/2502.15964&#34;&gt;&amp;ldquo;Minions: Cost-efficient Collaboration Between On-device and Cloud Language Models&amp;rdquo;&lt;/a&gt; by Narayan et al., addresses one of the most pressing challenges in AI application development - the cost of processing large amounts of data with expensive, cloud-based language models. If you&amp;rsquo;ve ever built an AI system that needs to analyze extensive documents or datasets, you&amp;rsquo;ve probably felt the frustration of watching your API costs skyrocket as you process more and more content.&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>RAG Agent with HyPE Pattern using Semantic Kernel</title>
      <link>https://www.strathweb.com/2025/07/rag-agent-with-hype-pattern-using-semantic-kernel/</link>
      <pubDate>Mon, 14 Jul 2025 07:06:14 +0000</pubDate>
      
      <guid>https://www.strathweb.com/2025/07/rag-agent-with-hype-pattern-using-semantic-kernel/</guid>
      <description>&lt;p&gt;In this post we will explore a novel approach to Retrieval-Augmented Generation (RAG) called &lt;a href=&#34;https://ssrn.com/abstract=5139335&#34;&gt;HyPE (Hypothetical Prompt Embeddings)&lt;/a&gt;, which I came across in a preprint paper recently. This technique tries to address one of the fundamental challenges in RAG systems: the semantic mismatch between user queries and document content. If you&amp;rsquo;ve ever built a RAG system, you&amp;rsquo;ve probably felt the frustration when your carefully crafted vector search returns seemingly irrelevant results. At least for me, it was always tremendously annoying when a simple question like &amp;ldquo;What is quantum entanglement?&amp;rdquo; wouldn&amp;rsquo;t reliably match a document section that clearly explains quantum entanglement.&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>AI Agents with OpenAPI Tools - Part 2: Azure AI Foundry</title>
      <link>https://www.strathweb.com/2025/06/ai-agents-with-openapi-tools-part-2-azure-ai-foundry/</link>
      <pubDate>Fri, 27 Jun 2025 07:06:14 +0000</pubDate>
      
      <guid>https://www.strathweb.com/2025/06/ai-agents-with-openapi-tools-part-2-azure-ai-foundry/</guid>
      <description>&lt;p&gt;In the &lt;a href=&#34;https://www.strathweb.com/2025/06/ai-agents-with-openapi-tools-part-1-semantic-kernel&#34;&gt;previous part&lt;/a&gt; of this series, we explored how to attach OpenAPI-based tools to a Semantic Kernel AI agent. In this part, we will look at another SDK for building AI Agents, Azure AI Foundry SDK, to create an agent that can also interact with OpenAPI-based tools.&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>AI Agents with OpenAPI Tools - Part 1: Semantic Kernel</title>
      <link>https://www.strathweb.com/2025/06/ai-agents-with-openapi-tools-part-1-semantic-kernel/</link>
      <pubDate>Mon, 23 Jun 2025 07:06:14 +0000</pubDate>
      
      <guid>https://www.strathweb.com/2025/06/ai-agents-with-openapi-tools-part-1-semantic-kernel/</guid>
      <description>&lt;p&gt;Today we will kick off a short series on building AI agents which have access to OpenAPI tools. In this first part, we will focus on the Semantic Kernel, and in the second part, we will look at Azure AI Foundry.&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Exploring Microsoft Foundry Local</title>
      <link>https://www.strathweb.com/2025/05/exploring-microsoft-foundry-local/</link>
      <pubDate>Wed, 28 May 2025 07:06:14 +0000</pubDate>
      
      <guid>https://www.strathweb.com/2025/05/exploring-microsoft-foundry-local/</guid>
      <description>&lt;p&gt;Last week at Build, Microsoft released a public preview of &lt;a href=&#34;https://github.com/microsoft/Foundry-Local&#34;&gt;Foundry Local&lt;/a&gt; - a new tool for running AI models locally on your machine, with a focus on privacy and security. It is a spiritual sibling to &lt;a href=&#34;https://azure.microsoft.com/en-us/products/ai-foundry&#34;&gt;Azure AI Foundry&lt;/a&gt;, which is a managed Azure cloud service for building and running AI applications.&lt;/p&gt;
&lt;p&gt;However, using Foundry Local is independent of Azure, and it can run models locally without any cloud dependencies. It is currently available for Windows x64, Windows ARM and macOS ARM.&lt;/p&gt;
&lt;p&gt;I was involved in the private testing of Foundry Local already prior to last week&amp;rsquo;s announcement, and so I have been using it for a while now. In this post, I will share my experience with it.&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Using Phi Silica in Windows App SDK on a Copilot Plus PC</title>
      <link>https://www.strathweb.com/2025/04/using-phi-silica-in-windows-app-sdk-on-copilot-plus-pc/</link>
      <pubDate>Fri, 25 Apr 2025 07:06:14 +0000</pubDate>
      
      <guid>https://www.strathweb.com/2025/04/using-phi-silica-in-windows-app-sdk-on-copilot-plus-pc/</guid>
      <description>&lt;p&gt;&lt;a href=&#34;https://blogs.windows.com/windowsexperience/2024/12/06/phi-silica-small-but-mighty-on-device-slm/&#34;&gt;Last year&lt;/a&gt;, Microsoft announced the Copilot Plus PC, a new class of devices that are designed to run AI workloads locally. The flagship device of the line is of course the &lt;a href=&#34;https://www.microsoft.com/en-us/surface/devices/surface-pro-11th-edition&#34;&gt;Surface Pro 11&lt;/a&gt;, which is powered by the Qualcomm Snapdragon X Elite ARM processor. Unfortunately, since the launch, the AI capabilities have been more than underwhelming, as few applications and workloads are able to take advantage of the integrated NPU hardware.&lt;/p&gt;
&lt;p&gt;One of the milestones in this direction is the &lt;a href=&#34;https://www.microsoft.com/en/windows/business/devices/copilot-plus-pcs&#34;&gt;Phi Silica&lt;/a&gt; model, which is a small but powerful ONNX-Runtime-based on-device SLM (Small Language Model) that is designed to run on the Copilot Plus PC &lt;a href=&#34;https://learn.microsoft.com/en-us/windows/ai/npu-devices/&#34;&gt;NPU&lt;/a&gt;, and that is built into the Windows Copilot Runtime. This removes a lot of the friction that developers have when trying to run models on-device, as they can now simply use the Windows App SDK to access the NPU and invoke the model just like ant other system API.&lt;/p&gt;
&lt;p&gt;Today we will have a look at how to use the Phi Silica model in a Windows App SDK applications.&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Using o-series Reasoning Models in PromptFlow</title>
      <link>https://www.strathweb.com/2025/03/using-o-series-reasoning-models-in-promptflow/</link>
      <pubDate>Mon, 24 Mar 2025 07:06:14 +0000</pubDate>
      
      <guid>https://www.strathweb.com/2025/03/using-o-series-reasoning-models-in-promptflow/</guid>
      <description>&lt;p&gt;If you have tried to use the OpenAI o-series reasoning models, such as &lt;a href=&#34;https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/reasoning?tabs=python-secure&#34;&gt;o1 or o3&lt;/a&gt;, with &lt;a href=&#34;https://microsoft.github.io/promptflow/&#34;&gt;PromptFlow&lt;/a&gt; recently, you certainly ran into a nasty surprise. While PromptFlow supports a wide range of models and providers, the o-series models are not among them. This is of course quite a shame, especially if you&amp;rsquo;d like to benchmark or evaluate your flows against those models.&lt;/p&gt;
&lt;p&gt;In this short post, we will look at a workaround.&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Running Phi models on iOS with Apple MLX Framework</title>
      <link>https://www.strathweb.com/2025/03/running-phi-models-on-ios-with-apple-mlx-framework/</link>
      <pubDate>Mon, 10 Mar 2025 08:30:12 +0000</pubDate>
      
      <guid>https://www.strathweb.com/2025/03/running-phi-models-on-ios-with-apple-mlx-framework/</guid>
      <description>&lt;p&gt;As I previously blogged a few times, I have been working on the &lt;a href=&#34;https://www.strathweb.com/2024/07/announcing-strathweb-phi-engine-a-cross-platform-library-for-running-phi-3-anywhere/&#34;&gt;Strathweb Phi Engine&lt;/a&gt;, a cross-platform library for running Phi model inference via a simple, high-level API, from a number of high-level languages: C#, Swift, Kotlin and Python. This of course includes the capability of running Phi models on iOS devices, and the sample repo contains a &lt;a href=&#34;https://github.com/filipw/strathweb-phi-engine/tree/main/samples/ios/phi.engine.sample&#34;&gt;demo SwiftUI application&lt;/a&gt; that demonstrates how to do this.&lt;/p&gt;
&lt;p&gt;Today I wanted to show an alternative way of running Phi models on iOS devices, using Apple&amp;rsquo;s &lt;a href=&#34;https://opensource.apple.com/projects/mlx/&#34;&gt;MLX framework&lt;/a&gt;. I previously &lt;a href=&#34;https://www.strathweb.com/2025/01/fine-tuning-phi-models-with-mlx&#34;&gt;blogged&lt;/a&gt; about fine-tuning Phi models on iOS using MLX, so that post is a good read if you want to learn more about the MLX framework and how to use it.&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Strathweb Phi Engine - now with Phi-4 support</title>
      <link>https://www.strathweb.com/2025/02/strathweb-phi-engine-now-with-phi-4-support/</link>
      <pubDate>Mon, 24 Feb 2025 07:06:14 +0000</pubDate>
      
      <guid>https://www.strathweb.com/2025/02/strathweb-phi-engine-now-with-phi-4-support/</guid>
      <description>&lt;p&gt;Last summer, I launched &lt;a href=&#34;https://www.strathweb.com/2024/07/announcing-strathweb-phi-engine-a-cross-platform-library-for-running-phi-3-anywhere&#34;&gt;Strathweb Phi Engine&lt;/a&gt; — a cross-platform library for running Phi model inference via a simple, high-level API, from a number of high-level languages: C#, Swift, Kotlin and Python.&lt;/p&gt;
&lt;p&gt;Today I am happy to announce support for Phi-4, the latest model in the Phi family, which Microsoft AI &lt;a href=&#34;https://techcommunity.microsoft.com/blog/aiplatformblog/introducing-phi-4-microsoft%E2%80%99s-newest-small-language-model-specializing-in-comple/4357090&#34;&gt;released&lt;/a&gt; in December 2024.&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Fine tuning Phi models with MLX</title>
      <link>https://www.strathweb.com/2025/01/fine-tuning-phi-models-with-mlx/</link>
      <pubDate>Fri, 17 Jan 2025 07:06:14 +0000</pubDate>
      
      <guid>https://www.strathweb.com/2025/01/fine-tuning-phi-models-with-mlx/</guid>
      <description>&lt;p&gt;Recently, I dedicated quite a lot of room &lt;a href=&#34;https://www.strathweb.com/categories/phi/&#34;&gt;on this blog&lt;/a&gt; to the topic of running Phi locally with the &lt;a href=&#34;https://github.com/filipw/strathweb-phi-engine&#34;&gt;Strathweb Phi Engine&lt;/a&gt;. This time, I want to focus on a different aspect of adopting small language models like Phi - fine-tuning them. We are going to do this with Apple&amp;rsquo;s &lt;a href=&#34;https://opensource.apple.com/projects/mlx/&#34;&gt;MLX&lt;/a&gt; library, which offers excellent performance for ML-related tasks on Apple Silicon.&lt;/p&gt;
&lt;p&gt;We are going to do &lt;a href=&#34;https://huggingface.co/docs/peft/main/en/conceptual_guides/lora&#34;&gt;LoRA&lt;/a&gt; fine tuning of a Phi model, and then invoke it using Strathweb Phi Engine.&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Running Phi Inference in .NET Applications with Strathweb Phi Engine</title>
      <link>https://www.strathweb.com/2024/12/running-phi-inference-in-net-applications-with-strathweb-phi-engine/</link>
      <pubDate>Fri, 20 Dec 2024 07:06:14 +0000</pubDate>
      
      <guid>https://www.strathweb.com/2024/12/running-phi-inference-in-net-applications-with-strathweb-phi-engine/</guid>
      <description>&lt;p&gt;Local AI inference has become increasingly important for developers seeking to build robust, privacy-preserving applications. In this deep dive, I&amp;rsquo;ll show you how to leverage &lt;a href=&#34;https://www.strathweb.com/2024/07/announcing-strathweb-phi-engine-a-cross-platform-library-for-running-phi-3-anywhere&#34;&gt;Strathweb Phi Engine&lt;/a&gt; multi-platform library to run Microsoft&amp;rsquo;s Phi-family models directly in your .NET applications, exploring both basic integration patterns and advanced features that make Phi inference more accessible than ever.&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Simplifying the AI workflow: Access different types of model deployments with Azure AI Inference</title>
      <link>https://www.strathweb.com/2024/11/simplifying-the-ai-workflow-access-different-types-of-model-deployments-with-azure-ai-inference/</link>
      <pubDate>Fri, 22 Nov 2024 07:06:14 +0000</pubDate>
      
      <guid>https://www.strathweb.com/2024/11/simplifying-the-ai-workflow-access-different-types-of-model-deployments-with-azure-ai-inference/</guid>
      <description>&lt;p&gt;In this post, we will explore the flexibility behind Azure AI Inference, a new &lt;a href=&#34;https://learn.microsoft.com/en-us/python/api/overview/azure/ai-inference-readme?view=azure-python-preview&#34;&gt;library&lt;/a&gt; from Azure, which allows us to run inference against a wide range of AI model deployments - both in Azure and, as we will see in this notebook, in other places as well.&lt;/p&gt;
&lt;p&gt;It is available for Python and for .NET - in this post, we will focus on the Python version.&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Strathweb Phi Engine - now with Safe Tensors support</title>
      <link>https://www.strathweb.com/2024/11/strathweb-phi-engine-now-with-safe-tensors-support/</link>
      <pubDate>Fri, 15 Nov 2024 07:06:14 +0000</pubDate>
      
      <guid>https://www.strathweb.com/2024/11/strathweb-phi-engine-now-with-safe-tensors-support/</guid>
      <description>&lt;p&gt;This summer, I announced the &lt;a href=&#34;https://www.strathweb.com/2024/07/announcing-strathweb-phi-engine-a-cross-platform-library-for-running-phi-3-anywhere&#34;&gt;Strathweb Phi Engine&lt;/a&gt; — a cross-platform library for running Phi inference anywhere. Up until now, the library only supported models in the quantized GGUF format. Today, I&amp;rsquo;m excited to share that the library now also supports the Safe Tensor model format.&lt;/p&gt;
&lt;p&gt;This enhancement significantly expands the scope of use cases and interoperability for the Strathweb Phi Engine. With Safe Tensor support, you can now load and execute models in a format that is not only performant but also prioritizes security and memory safety. Notably, all the Phi models published by Microsoft use the Safe Tensor format by default.&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>How GPT-4o-mini can be simultaneously 20x cheaper and 2x more expensive than GPT-4o</title>
      <link>https://www.strathweb.com/2024/10/how-gpt-4o-mini-can-be-simultaneously-20x-cheaper-and-2x-more-expensive-than-gpt-4o/</link>
      <pubDate>Fri, 25 Oct 2024 07:06:14 +0000</pubDate>
      
      <guid>https://www.strathweb.com/2024/10/how-gpt-4o-mini-can-be-simultaneously-20x-cheaper-and-2x-more-expensive-than-gpt-4o/</guid>
      <description>&lt;p&gt;GPT-4o-mini is the small, cost-effective version of the GPT-4o model. It is a great default choice for developers who want a very capable and fast model, but don&amp;rsquo;t need the full power of the GPT-4o model. However, there are some important things to keep in mind when using GPT-4o-mini, especially when it comes to pricing - some of which is rather contradictory!&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Speech-based retrieval augmented generation (RAG) with GPT-4o Realtime API</title>
      <link>https://www.strathweb.com/2024/10/speech-based-retrieval-augmented-generation-with-gpt-realtime-api/</link>
      <pubDate>Mon, 14 Oct 2024 07:06:14 +0000</pubDate>
      
      <guid>https://www.strathweb.com/2024/10/speech-based-retrieval-augmented-generation-with-gpt-realtime-api/</guid>
      <description>&lt;p&gt;On October 1st, &lt;a href=&#34;https://learn.microsoft.com/en-us/azure/ai-services/openai/realtime-audio-quickstart?pivots=programming-language-javascript&#34;&gt;OpenAI&lt;/a&gt; and &lt;a href=&#34;https://openai.com/index/introducing-the-realtime-api/&#34;&gt;Microsoft&lt;/a&gt; (Azure OpenAI) announced the availability of the GPT-4o Realtime API for speech and audio. It is a new, innovative way of interacting with the GPT-4o model family, the provides a &amp;ldquo;speech in, speech out&amp;rdquo; conversational interface. Contrary to traditional text-based APIs, the Realtime API allows sending the audio input directly to the model, and receiving the audio output back. This is a significant improvement over the existing solutions to voice-enabled assistants, which required converting the audio to text first, and then converting the text back to audio. The Realtime API is currently in preview, and the SDKs for various languages have mixed-level of support for them, but it is already possible to build exciting new applications with it.&lt;/p&gt;
&lt;p&gt;The low-latency speech-based interface also poses some challenges to established AI architectural patterns, such as &lt;a href=&#34;https://www.strathweb.com/2023/11/using-your-own-data-with-gpt-models-in-azure-openai-part-1/&#34;&gt;Retrieval-Augmented Generation (RAG)&lt;/a&gt; - and today we will tackle just that, and have a look at a small sample realtime-voice RAG app in .NET.&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Using Local Phi-3 Models in AutoGen with Strathweb Phi Engine</title>
      <link>https://www.strathweb.com/2024/09/using-local-phi-3-models-in-autogen-with-strathweb-phi-engine/</link>
      <pubDate>Fri, 06 Sep 2024 07:06:14 +0000</pubDate>
      
      <guid>https://www.strathweb.com/2024/09/using-local-phi-3-models-in-autogen-with-strathweb-phi-engine/</guid>
      <description>&lt;p&gt;I recently announced &lt;a href=&#34;https://www.strathweb.com/2024/07/announcing-strathweb-phi-engine-a-cross-platform-library-for-running-phi-3-anywhere&#34;&gt;Strathweb Phi Engine&lt;/a&gt;, a cross-platform library/toolset for conveniently running Phi-3 (almost) anywhere. Today I would like to show how to integrate a local Phi-3 model, orchestrated by Strathweb Phi Engine, into an agentic workflow built with &lt;a href=&#34;https://github.com/microsoft/autogen&#34;&gt;AutoGen&lt;/a&gt;.&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Announcing Strathweb Phi Engine - a cross-platform library for running Phi-3 anywhere</title>
      <link>https://www.strathweb.com/2024/07/announcing-strathweb-phi-engine-a-cross-platform-library-for-running-phi-3-anywhere/</link>
      <pubDate>Thu, 25 Jul 2024 04:06:14 +0000</pubDate>
      
      <guid>https://www.strathweb.com/2024/07/announcing-strathweb-phi-engine-a-cross-platform-library-for-running-phi-3-anywhere/</guid>
      <description>&lt;p&gt;I &lt;a href=&#34;https://www.strathweb.com/2024/05/running-microsoft-phi-3-model-in-an-ios-app-with-rust&#34;&gt;recently&lt;/a&gt; wrote a blog post about using Rust to run Phi-3 model on iOS. The post received an overwhelmingly positive response, and I got a lot of questions about running Phi-3 using similar approach on other platforms, such as Android, Windows, macOS or Linux. Today, I&amp;rsquo;m excited to announce the project I have been working on recently - Strathweb Phi Engine, a cross-platform library for running Phi-3 (almost) anywhere.&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Running Microsoft&#39;s Phi-3 Model in an iOS app with Rust</title>
      <link>https://www.strathweb.com/2024/05/running-microsoft-phi-3-model-in-an-ios-app-with-rust/</link>
      <pubDate>Thu, 09 May 2024 07:06:14 +0000</pubDate>
      
      <guid>https://www.strathweb.com/2024/05/running-microsoft-phi-3-model-in-an-ios-app-with-rust/</guid>
      <description>&lt;p&gt;Last month, &lt;a href=&#34;https://azure.microsoft.com/en-us/blog/introducing-phi-3-redefining-whats-possible-with-slms/&#34;&gt;Microsoft released&lt;/a&gt; the exciting new minimal AI model, Phi-3 mini. It&amp;rsquo;s a 3.8B model that can outperform many other larger models, while still being small enough to run on a phone. In this post, we&amp;rsquo;ll explore how to run the Phi-3 model inside a SwiftUI iOS application using the minimalist ML framework for Rust, called &lt;a href=&#34;https://github.com/huggingface/candle&#34;&gt;candle&lt;/a&gt;, and built by the nice folks at HuggingFace.&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Tool Calling with Azure OpenAI - Part 2: Using the tools directly via the SDK</title>
      <link>https://www.strathweb.com/2024/04/function-calling-with-azure-openai-part-2/</link>
      <pubDate>Fri, 19 Apr 2024 07:06:14 +0000</pubDate>
      
      <guid>https://www.strathweb.com/2024/04/function-calling-with-azure-openai-part-2/</guid>
      <description>&lt;p&gt;&lt;a href=&#34;https://www.strathweb.com/2024/04/function-calling-with-azure-openai-part-1&#34;&gt;Last time around&lt;/a&gt;, we discussed how Large Language Models can select the appropriate tool and its required parameters out of freely flowing conversation text. We also introduced the formal concept of those tools, which are structurally described using an &lt;a href=&#34;https://swagger.io/docs/specification/data-models/&#34;&gt;OpenAPI schema&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In this part 2 of the series, we are going to build two different .NET command line assistant applications, both taking advantage of the tool calling integration. We will orchestrate everything by hand - that is, we will only use the Azure OpenAI Service API directly (or rather using the .NET SDK for Azure OpenAI) - without any additional AI frameworks.&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Tool Calling with Azure OpenAI - Part 1: The Basics</title>
      <link>https://www.strathweb.com/2024/04/function-calling-with-azure-openai-part-1/</link>
      <pubDate>Thu, 04 Apr 2024 07:06:14 +0000</pubDate>
      
      <guid>https://www.strathweb.com/2024/04/function-calling-with-azure-openai-part-1/</guid>
      <description>&lt;p&gt;One of the fantastic capabilities of the Large Language Models is their ability to choose (based on a predefined set of tool definitions) the appropriate tool and its required parameters out of freely flowing conversation text. With that, they can act as facilitators of workflow orchestration, where they would instruct applications to invoke specific tools, with specific set of arguments.&lt;/p&gt;
&lt;p&gt;OpenAI announced the built-in capability called &lt;a href=&#34;https://openai.com/blog/function-calling-and-other-api-updates&#34;&gt;function calling&lt;/a&gt; in the summer of last year, and by now it is an integral part of working with and building applications on top of the GPT models. The functionality was later renamed in the &lt;a href=&#34;https://learn.microsoft.com/en-us/azure/ai-services/openai/reference#example-response-2&#34;&gt;API&lt;/a&gt; to &amp;ldquo;tools&amp;rdquo;, to better express their broad scope and nature.&lt;/p&gt;
&lt;p&gt;Today I am starting a new multi-post Azure OpenAI blog series focusing specifically on the tool capabilities. We will build a client application with .NET, and explore tool integration from different angles - using the &lt;a href=&#34;https://www.nuget.org/packages/Azure.AI.OpenAI/&#34;&gt;Azure OpenAI .NET SDK&lt;/a&gt; directly, using the &lt;a href=&#34;https://www.nuget.org/packages/Azure.AI.OpenAI.Assistants/&#34;&gt;Assistants SDK&lt;/a&gt; and finally leveraging various orchestration frameworks such as &lt;a href=&#34;https://github.com/microsoft/semantic-kernel&#34;&gt;Semantic Kernel&lt;/a&gt; and &lt;a href=&#34;https://github.com/microsoft/autogen&#34;&gt;AutoGen&lt;/a&gt;. In today&amp;rsquo;s part one, we are going to introduce the basic concepts behind tool calling.&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Combining Azure OpenAI with Azure AI Speech</title>
      <link>https://www.strathweb.com/2024/03/combining-azure-openai-with-azure-ai-speech/</link>
      <pubDate>Fri, 08 Mar 2024 07:06:14 +0000</pubDate>
      
      <guid>https://www.strathweb.com/2024/03/combining-azure-openai-with-azure-ai-speech/</guid>
      <description>&lt;p&gt;In my &lt;a href=&#34;https://www.strathweb.com/categories/openai/&#34;&gt;recent posts&lt;/a&gt;, I&amp;rsquo;ve been exploring various facets of the &lt;a href=&#34;https://azure.microsoft.com/en-us/products/ai-services/openai-service&#34;&gt;Azure OpenAI Service&lt;/a&gt;, discussing how it can power up our applications with AI. Today, I&amp;rsquo;m taking a slightly different angle - I want to dive into how we can enhance our projects further by integrating Azure OpenAI Service with &lt;a href=&#34;https://azure.microsoft.com/en-us/products/ai-services/ai-speech&#34;&gt;Azure AI Speech&lt;/a&gt;. Let&amp;rsquo;s explore what this integration means and how it could lead to exciting, AI-powered applications.&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Using your own data with GPT models in Azure OpenAI - Part 4: Adding vector search</title>
      <link>https://www.strathweb.com/2024/02/using-your-own-data-with-gpt-models-in-azure-openai-part-4/</link>
      <pubDate>Fri, 23 Feb 2024 07:06:14 +0000</pubDate>
      
      <guid>https://www.strathweb.com/2024/02/using-your-own-data-with-gpt-models-in-azure-openai-part-4/</guid>
      <description>&lt;p&gt;For our Retrieval-Augmented-Generation (RAG) application, we setup AI Search in &lt;a href=&#34;https://www.strathweb.com/2023/11/using-your-own-data-with-gpt-models-in-azure-openai-part-1&#34;&gt;part 1&lt;/a&gt;, however so far we only used it using the basic keyword search.&lt;/p&gt;
&lt;p&gt;In this part 4 of the series about bringing your own data to Azure OpenAI Service, we will go ahead and integrate vector search, as a more sophisticated way of performing the search across the Azure AI Search index within our RAG-pattern system.&lt;/p&gt;
&lt;p&gt;I already covered &lt;a href=&#34;https://www.strathweb.com/2023/09/using-embeddings-model-with-azure-openai&#34;&gt;vectorization and embeddings&lt;/a&gt; using the OpenAI embedding model on this blog, and we will be relying on the same principles here. I recommend reading through that article before continuing if you are not yet familiar with the concept of embeddings.&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Using your own data with GPT models in Azure OpenAI - Part 3: Calling Azure OpenAI Service via .NET SDK</title>
      <link>https://www.strathweb.com/2023/12/using-your-own-data-with-gpt-models-in-azure-openai-part-3/</link>
      <pubDate>Mon, 18 Dec 2023 07:06:14 +0000</pubDate>
      
      <guid>https://www.strathweb.com/2023/12/using-your-own-data-with-gpt-models-in-azure-openai-part-3/</guid>
      <description>&lt;p&gt;In the &lt;a href=&#34;https://www.strathweb.com/2023/11/using-your-own-data-with-gpt-models-in-azure-openai-part-2&#34;&gt;last post&lt;/a&gt; of this series we set up a demo .NET client application that was able to call and utilize a GPT model hosted in Azure OpenAI Service, which in turn was integrated with our own custom data via Azure AI Search. We did this using the bare bones REST API - and in part three, it&amp;rsquo;s time to shift gears and explore how to accomplish similar task using the .NET SDK, which offers a more streamlined and less ceremonious approach over calling the HTTP endpoints directly.&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Using your own data with GPT models in Azure OpenAI - Part 2: Calling Azure OpenAI Service via REST API</title>
      <link>https://www.strathweb.com/2023/11/using-your-own-data-with-gpt-models-in-azure-openai-part-2/</link>
      <pubDate>Fri, 24 Nov 2023 07:06:14 +0000</pubDate>
      
      <guid>https://www.strathweb.com/2023/11/using-your-own-data-with-gpt-models-in-azure-openai-part-2/</guid>
      <description>&lt;p&gt;In the &lt;a href=&#34;https://www.strathweb.com/2023/11/using-your-own-data-with-gpt-models-in-azure-openai-part-1&#34;&gt;previous part&lt;/a&gt; of this series, we have successfully set up &lt;a href=&#34;https://azure.microsoft.com/en-us/products/ai-services/openai-service&#34;&gt;Azure AI Search&lt;/a&gt;, to have it ready for integration with &lt;a href=&#34;https://azure.microsoft.com/en-us/products/ai-services/cognitive-search&#34;&gt;Azure OpenAI Service&lt;/a&gt;. The ultimate goal is to take advantage of the retrieval-augmented-generation pattern, and enhancing our interactions with the GPT model with our own custom data.&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s continue building this today.&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Using your own data with GPT models in Azure OpenAI - Part 1: Setting up Azure AI Search</title>
      <link>https://www.strathweb.com/2023/11/using-your-own-data-with-gpt-models-in-azure-openai-part-1/</link>
      <pubDate>Fri, 10 Nov 2023 07:06:14 +0000</pubDate>
      
      <guid>https://www.strathweb.com/2023/11/using-your-own-data-with-gpt-models-in-azure-openai-part-1/</guid>
      <description>&lt;p&gt;There is no question that the emergence of generative AI is going to significantly alter various aspects of our daily lives. At the same time, most of the large language models (LLMs) are designed as general-purpose black boxes and their utility is initially confined to the data they were trained on. However, it is possible to extend their functionality and reasoning to any custom data set, be it private or public, even without the massive effort that would be needed to retrain or even fine-tune them.&lt;/p&gt;
&lt;p&gt;We are going to start exploring that concept today with a multi-part post series on &amp;ldquo;bringing your own data&amp;rdquo; to Azure OpenAI. In part one today, we will set up the necessary Azure resources and prepare the stage for a client application integration, which will follow in parts two and further.&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Using embeddings model with Azure OpenAI Service</title>
      <link>https://www.strathweb.com/2023/09/using-embeddings-model-with-azure-openai/</link>
      <pubDate>Wed, 13 Sep 2023 07:00:14 +0000</pubDate>
      
      <guid>https://www.strathweb.com/2023/09/using-embeddings-model-with-azure-openai/</guid>
      <description>&lt;p&gt;I recently &lt;a href=&#34;https://www.strathweb.com/2023/04/building-gpt-powered-applications-with-azure-openai-service/&#34;&gt;blogged&lt;/a&gt; about building GPT-powered applications with &lt;a href=&#34;https://azure.microsoft.com/en-us/products/ai-services/openai-service&#34;&gt;Azure OpenAI Service&lt;/a&gt;. In that post, we looked at using the &lt;em&gt;text-davinci-003&lt;/em&gt; model to provide classification capabilities for natural text - more specifically, we categorized and rated scientific papers based on the interest area (note that the recommended model for this task now is &lt;em&gt;gpt-35-turbo&lt;/em&gt; now).&lt;/p&gt;
&lt;p&gt;In today&amp;rsquo;s post we are going to continue exploring Azure OpenAI Service, this time looking at the embeddings model, &lt;em&gt;text-embedding-ada-002&lt;/em&gt;.&lt;/p&gt;</description>
    </item>
    
    <item>
      <title>Building GPT powered applications with Azure OpenAI Service</title>
      <link>https://www.strathweb.com/2023/04/building-gpt-powered-applications-with-azure-openai-service/</link>
      <pubDate>Wed, 26 Apr 2023 10:06:14 +0000</pubDate>
      
      <guid>https://www.strathweb.com/2023/04/building-gpt-powered-applications-with-azure-openai-service/</guid>
      <description>In this post we will have a look at how we can utilize Azure OpenAI Service to build applications using various OpenAI models. At the high level, Azure OpenAI allows accessing GPT-4, GPT-3, Codex and Embeddings models using the security boundary of Azure, and while ensuring data privacy and residency and conforming to other common enterprise requirements such as private networking.
In other words, it addresses one of the biggest worries of integrating AI services into own applications - the data is never shared with OpenAI.</description>
    </item>
    
  </channel>
</rss>
