Ai on Strathweb. A free flowing tech monologue.

Using Phi Silica in Windows App SDK on a Copilot Plus PC

Fri, 25 Apr 2025 07:06:14 +0000

Last year, Microsoft announced the Copilot Plus PC, a new class of devices that are designed to run AI workloads locally. The flagship device of the line is of course the Surface Pro 11, which is powered by the Qualcomm Snapdragon X Elite ARM processor. Unfortunately, since the launch, the AI capabilities have been more than underwhelming, as few applications and workloads are able to take advantage of the integrated NPU hardware.

One of the milestones in this direction is the Phi Silica model, which is a small but powerful ONNX-Runtime-based on-device SLM (Small Language Model) that is designed to run on the Copilot Plus PC NPU, and that is built into the Windows Copilot Runtime. This removes a lot of the friction that developers have when trying to run models on-device, as they can now simply use the Windows App SDK to access the NPU and invoke the model just like ant other system API.

Today we will have a look at how to use the Phi Silica model in a Windows App SDK applications.

Using o-series Reasoning Models in PromptFlow

Mon, 24 Mar 2025 07:06:14 +0000

If you have tried to use the OpenAI o-series reasoning models, such as o1 or o3, with PromptFlow recently, you certainly ran into a nasty surprise. While PromptFlow supports a wide range of models and providers, the o-series models are not among them. This is of course quite a shame, especially if you’d like to benchmark or evaluate your flows against those models.

In this short post, we will look at a workaround.

Running Phi models on iOS with Apple MLX Framework

Mon, 10 Mar 2025 08:30:12 +0000

As I previously blogged a few times, I have been working on the Strathweb Phi Engine, a cross-platform library for running Phi model inference via a simple, high-level API, from a number of high-level languages: C#, Swift, Kotlin and Python. This of course includes the capability of running Phi models on iOS devices, and the sample repo contains a demo SwiftUI application that demonstrates how to do this.

Today I wanted to show an alternative way of running Phi models on iOS devices, using Apple’s MLX framework. I previously blogged about fine-tuning Phi models on iOS using MLX, so that post is a good read if you want to learn more about the MLX framework and how to use it.

Strathweb Phi Engine - now with Phi-4 support

Mon, 24 Feb 2025 07:06:14 +0000

Last summer, I launched Strathweb Phi Engine — a cross-platform library for running Phi model inference via a simple, high-level API, from a number of high-level languages: C#, Swift, Kotlin and Python.

Today I am happy to announce support for Phi-4, the latest model in the Phi family, which Microsoft AI released in December 2024.

Fine tuning Phi models with MLX

Fri, 17 Jan 2025 07:06:14 +0000

Recently, I dedicated quite a lot of room on this blog to the topic of running Phi locally with the Strathweb Phi Engine. This time, I want to focus on a different aspect of adopting small language models like Phi - fine-tuning them. We are going to do this with Apple’s MLX library, which offers excellent performance for ML-related tasks on Apple Silicon.

We are going to do LoRA fine tuning of a Phi model, and then invoke it using Strathweb Phi Engine.

Running Phi Inference in .NET Applications with Strathweb Phi Engine

Fri, 20 Dec 2024 07:06:14 +0000

Local AI inference has become increasingly important for developers seeking to build robust, privacy-preserving applications. In this deep dive, I’ll show you how to leverage Strathweb Phi Engine multi-platform library to run Microsoft’s Phi-family models directly in your .NET applications, exploring both basic integration patterns and advanced features that make Phi inference more accessible than ever.

Simplifying the AI workflow: Access different types of model deployments with Azure AI Inference

Fri, 22 Nov 2024 07:06:14 +0000

In this post, we will explore the flexibility behind Azure AI Inference, a new library from Azure, which allows us to run inference against a wide range of AI model deployments - both in Azure and, as we will see in this notebook, in other places as well.

It is available for Python and for .NET - in this post, we will focus on the Python version.

Strathweb Phi Engine - now with Safe Tensors support

Fri, 15 Nov 2024 07:06:14 +0000

This summer, I announced the Strathweb Phi Engine — a cross-platform library for running Phi inference anywhere. Up until now, the library only supported models in the quantized GGUF format. Today, I’m excited to share that the library now also supports the Safe Tensor model format.

This enhancement significantly expands the scope of use cases and interoperability for the Strathweb Phi Engine. With Safe Tensor support, you can now load and execute models in a format that is not only performant but also prioritizes security and memory safety. Notably, all the Phi models published by Microsoft use the Safe Tensor format by default.

How GPT-4o-mini can be simultaneously 20x cheaper and 2x more expensive than GPT-4o

Fri, 25 Oct 2024 07:06:14 +0000

GPT-4o-mini is the small, cost-effective version of the GPT-4o model. It is a great default choice for developers who want a very capable and fast model, but don’t need the full power of the GPT-4o model. However, there are some important things to keep in mind when using GPT-4o-mini, especially when it comes to pricing - some of which is rather contradictory!

Speech-based retrieval augmented generation (RAG) with GPT-4o Realtime API

Mon, 14 Oct 2024 07:06:14 +0000

On October 1st, OpenAI and Microsoft (Azure OpenAI) announced the availability of the GPT-4o Realtime API for speech and audio. It is a new, innovative way of interacting with the GPT-4o model family, the provides a “speech in, speech out” conversational interface. Contrary to traditional text-based APIs, the Realtime API allows sending the audio input directly to the model, and receiving the audio output back. This is a significant improvement over the existing solutions to voice-enabled assistants, which required converting the audio to text first, and then converting the text back to audio. The Realtime API is currently in preview, and the SDKs for various languages have mixed-level of support for them, but it is already possible to build exciting new applications with it.

The low-latency speech-based interface also poses some challenges to established AI architectural patterns, such as Retrieval-Augmented Generation (RAG) - and today we will tackle just that, and have a look at a small sample realtime-voice RAG app in .NET.

Using Local Phi-3 Models in AutoGen with Strathweb Phi Engine

Fri, 06 Sep 2024 07:06:14 +0000

I recently announced Strathweb Phi Engine, a cross-platform library/toolset for conveniently running Phi-3 (almost) anywhere. Today I would like to show how to integrate a local Phi-3 model, orchestrated by Strathweb Phi Engine, into an agentic workflow built with AutoGen.

Announcing Strathweb Phi Engine - a cross-platform library for running Phi-3 anywhere

Thu, 25 Jul 2024 04:06:14 +0000

I recently wrote a blog post about using Rust to run Phi-3 model on iOS. The post received an overwhelmingly positive response, and I got a lot of questions about running Phi-3 using similar approach on other platforms, such as Android, Windows, macOS or Linux. Today, I’m excited to announce the project I have been working on recently - Strathweb Phi Engine, a cross-platform library for running Phi-3 (almost) anywhere.

Running Microsoft's Phi-3 Model in an iOS app with Rust

Thu, 09 May 2024 07:06:14 +0000

Last month, Microsoft released the exciting new minimal AI model, Phi-3 mini. It’s a 3.8B model that can outperform many other larger models, while still being small enough to run on a phone. In this post, we’ll explore how to run the Phi-3 model inside a SwiftUI iOS application using the minimalist ML framework for Rust, called candle, and built by the nice folks at HuggingFace.

Tool Calling with Azure OpenAI - Part 2: Using the tools directly via the SDK

Fri, 19 Apr 2024 07:06:14 +0000

Last time around, we discussed how Large Language Models can select the appropriate tool and its required parameters out of freely flowing conversation text. We also introduced the formal concept of those tools, which are structurally described using an OpenAPI schema.

In this part 2 of the series, we are going to build two different .NET command line assistant applications, both taking advantage of the tool calling integration. We will orchestrate everything by hand - that is, we will only use the Azure OpenAI Service API directly (or rather using the .NET SDK for Azure OpenAI) - without any additional AI frameworks.

Tool Calling with Azure OpenAI - Part 1: The Basics

Thu, 04 Apr 2024 07:06:14 +0000

One of the fantastic capabilities of the Large Language Models is their ability to choose (based on a predefined set of tool definitions) the appropriate tool and its required parameters out of freely flowing conversation text. With that, they can act as facilitators of workflow orchestration, where they would instruct applications to invoke specific tools, with specific set of arguments.

OpenAI announced the built-in capability called function calling in the summer of last year, and by now it is an integral part of working with and building applications on top of the GPT models. The functionality was later renamed in the API to “tools”, to better express their broad scope and nature.

Today I am starting a new multi-post Azure OpenAI blog series focusing specifically on the tool capabilities. We will build a client application with .NET, and explore tool integration from different angles - using the Azure OpenAI .NET SDK directly, using the Assistants SDK and finally leveraging various orchestration frameworks such as Semantic Kernel and AutoGen. In today’s part one, we are going to introduce the basic concepts behind tool calling.

Combining Azure OpenAI with Azure AI Speech

Fri, 08 Mar 2024 07:06:14 +0000

In my recent posts, I’ve been exploring various facets of the Azure OpenAI Service, discussing how it can power up our applications with AI. Today, I’m taking a slightly different angle - I want to dive into how we can enhance our projects further by integrating Azure OpenAI Service with Azure AI Speech. Let’s explore what this integration means and how it could lead to exciting, AI-powered applications.

Using your own data with GPT models in Azure OpenAI - Part 4: Adding vector search

Fri, 23 Feb 2024 07:06:14 +0000

For our Retrieval-Augmented-Generation (RAG) application, we setup AI Search in part 1, however so far we only used it using the basic keyword search.

In this part 4 of the series about bringing your own data to Azure OpenAI Service, we will go ahead and integrate vector search, as a more sophisticated way of performing the search across the Azure AI Search index within our RAG-pattern system.

I already covered vectorization and embeddings using the OpenAI embedding model on this blog, and we will be relying on the same principles here. I recommend reading through that article before continuing if you are not yet familiar with the concept of embeddings.

Using your own data with GPT models in Azure OpenAI - Part 3: Calling Azure OpenAI Service via .NET SDK

Mon, 18 Dec 2023 07:06:14 +0000

In the last post of this series we set up a demo .NET client application that was able to call and utilize a GPT model hosted in Azure OpenAI Service, which in turn was integrated with our own custom data via Azure AI Search. We did this using the bare bones REST API - and in part three, it’s time to shift gears and explore how to accomplish similar task using the .NET SDK, which offers a more streamlined and less ceremonious approach over calling the HTTP endpoints directly.

Using your own data with GPT models in Azure OpenAI - Part 2: Calling Azure OpenAI Service via REST API

Fri, 24 Nov 2023 07:06:14 +0000

In the previous part of this series, we have successfully set up Azure AI Search, to have it ready for integration with Azure OpenAI Service. The ultimate goal is to take advantage of the retrieval-augmented-generation pattern, and enhancing our interactions with the GPT model with our own custom data.

Let’s continue building this today.

Using your own data with GPT models in Azure OpenAI - Part 1: Setting up Azure AI Search

Fri, 10 Nov 2023 07:06:14 +0000

There is no question that the emergence of generative AI is going to significantly alter various aspects of our daily lives. At the same time, most of the large language models (LLMs) are designed as general-purpose black boxes and their utility is initially confined to the data they were trained on. However, it is possible to extend their functionality and reasoning to any custom data set, be it private or public, even without the massive effort that would be needed to retrain or even fine-tune them.

We are going to start exploring that concept today with a multi-part post series on “bringing your own data” to Azure OpenAI. In part one today, we will set up the necessary Azure resources and prepare the stage for a client application integration, which will follow in parts two and further.

Using embeddings model with Azure OpenAI Service

Wed, 13 Sep 2023 07:00:14 +0000

I recently blogged about building GPT-powered applications with Azure OpenAI Service. In that post, we looked at using the text-davinci-003 model to provide classification capabilities for natural text - more specifically, we categorized and rated scientific papers based on the interest area (note that the recommended model for this task now is gpt-35-turbo now).

In today’s post we are going to continue exploring Azure OpenAI Service, this time looking at the embeddings model, text-embedding-ada-002.

Building GPT powered applications with Azure OpenAI Service

Wed, 26 Apr 2023 10:06:14 +0000

In this post we will have a look at how we can utilize Azure OpenAI Service to build applications using various OpenAI models. At the high level, Azure OpenAI allows accessing GPT-4, GPT-3, Codex and Embeddings models using the security boundary of Azure, and while ensuring data privacy and residency and conforming to other common enterprise requirements such as private networking. In other words, it addresses one of the biggest worries of integrating AI services into own applications - the data is never shared with OpenAI.