The curious case of ASP.NET Core integration test deadlock

One of the common approaches to testing ASP.NET Core applications is to use the integration testing available via the Microsoft.AspNetCore.TestHost package. In particular, the arguably most common scenario is integration testing of the MVC applications via the Microsoft.AspNetCore.Mvc.Testing, which provides a set of MVC-specific helpers on top of the test host.

In this post I want to share a curious case of deadlocking integration tests in an ASP.NET Core 3.1 application.

Integration testing

The functionality offered by the test host package is not new – a feature like this has been part of the ASP.NET application landscape since the ASP.NET Web API days. In fact I blogged about this topic already back in 2012, where running the entire server-client pipeline in memory was a real novelty.

In short, the feature allows you to start your application server using the regular application bootstrapping path – your full host configuration, entire Startup class, including the DI container and all middleware components – but instead of launching an HTTP listener on a specific port, it initializes the application pipeline in-memory only. Then the HttpClient, via its ClientHandler can be plugged into the test host, enabling you to communicate with your app via HttpClient, using the typical HTTP abstractions – as if you were calling a real live server.

The result is, you can perform end-to-end integration tests for your application, without having to launch a real HTTP listener process for your app. This is a great functionality, and has become a staple for ASP.NET and, more recently, ASP.NET Core application development. The official documentation page has quite exhaustive information on how to get started should you need more information.

Deadlocking tests

We use these types of integration testing a lot in our work, and one thing we noticed at some point, that at certain load level – when there were a lot of many complicated tests, we started having hangs (deadlocks) in the tests. This would never happen on the powerful DEV machines, but only during the CI runs which in our case are Docker images with very limited resources.

The code we used was quite complicated but it didn't do anything particularly outrageous that jumped out as the potential culprit. The simplified variant of the code was as follows:

All of this is of course not deviating from how you are expected to be using this – as formulated in the documentation I linked to earlier. It is using WebApplicationFactory, which is part of Microsoft.AspNetCore.Mvc.Testing. That package wraps the test host and all the core logic of the integration testing pipeline is part of Microsoft.AspNetCore.TestHost.TestServer class.

So what happened?

Of course historically, the reflex reaction, the first thought, when dealing with deadlocks in ASP.NET, has been to think, well, we must have a sync-over-async somewhere – there must some code that blocks on async operation and that is most likely the culprit.

However .NET Core has no synchronization context anymore! In other words, in ASP.NET Core when an asynchronous operation resumes, a thread from the thread pool executes the continuation. This is all “context-free” and there is nothing to deadlock anymore. So this should not be the problem anymore, right? That would be true if we were running a regular .NET Core application, however, in this case, we were running integration tests, more specifically xUnit.net tests. As it turns out, xUnit.net uses a custom synchronization context, and it does it for two reasons – to enforce its parallelization limitations, and to support async void in the test methods.

This led us (Lukasz did some great detective work on this) to suspect that we actually might have a sync over async somewhere after all. And lo and behold, it was found – but not in our code, but in the framework itself. The TestServer constructor explicitly blocks an async operation:

So without doing anything wrong in your own codebase, if the machine running a test has a single CPU (like a low-resource CI agent for example, or a small limited local Docker image), you will run into an xUnit test deadlock. Remarkably, this sync-over-async in TestServer still hasn't been fixed or addresssed in ASP.NET 6.0 preview 3. In our case, WebApplicationFactory ends up going over the sync path as soon as the client is created.

On a sligthly more powerful machine, the code can produce a non-deterministic deadlock depending on the amount of tests executing concurrently, and xUnit parallelizes tests collections by default.

There are several ways out of it. One is to set unlimited value of parallel threads. This can be done in xunit.runner.json:

or as an attribute:

A different solution that we opted for, which required subclassing WebApplicationFactory, simply moved the host initialization outside of the xUnit synchronization context, which allowed the task and its continuation to be executed on any other thread:

There is still sync-over-async here but xUnit synchronization context is no longer involved. But of course, ideally, this would be fixed in ASP.NET Core test host.