As you may already know, it’s possible to list all the managed threads in a .NET memory dump using the !threads command:

The “ID” column gives you the managed thread id, which is the same value that you could retrieve from the code by calling thread.ManagedThreadId. Let’s say we are interested by the thread with the managed id “16”.

You can switch the active thread by retrieving the value of the “DBG” column (so for the thread with managed id 16, that would be “20”) and giving it to the ~ command:

When switching thread, WinDbg shows the…

Whenever you need to analyze complex structures in a .NET memory dump, the WinDbg scripting API quickly shows its limits. In those cases, you can instead use the ClrMD library, that will give you everything you need to inspect the memory dump from C# code.

Not everything is perfect however, and sometimes I feel like the ClrMD syntax does not feel “natural” enough. To take one concrete example, for an investigation I had to retrieve the URLs of the pending HTTP requests in a memory dump. To do so, I needed to:

One thing that has bothered me quite a bit with PerfView is how it groups all unresolved frames under the same “?!?” name. I understand that it’s a way to reduce noise, but when trying to reduce the CPU usage of an application it can be unsettling.

Take the following example:

Here, “?!?” is presented as top offender, accounting for more than 7% of the total CPU usage. But I’m missing critical information to know whether I should consider it as a bottleneck or not. If there’s a single unresolved function accounting for 7% of the total CPU then I…

The crash

This one started when trying to understand why an integration test was failing, only on Linux with ARM64.

As I had no ARM64 dev environment available, I first tried adding more and more traces and let the test run in the CI, without much success.

Eventually, I realized this was leading nowhere, and took the time to setup an ARM64 VM to investigate further. …

This story begins when one of our integrations tests started got stuck on one PR that seemingly impacted unrelated code. This is a nice excuse to cover some concepts I haven’t touched in my previous articles, such as downloading the .NET symbols on Linux.

Preliminary inspection

The failure was occurring in a Linux test. After a while, I managed to reproduce the issue locally in a docker container. Usually the first step would be to attach a debugger, but I didn’t want to spend the time to find the commands to inject the debugger in the container. …

This is the second part of an investigation where I tried to understand why an application was randomly crashing with an AccessViolationException.

If you haven’t read it, you can find part 1 of the investigation here.

As a reminder, here is what we uncovered so far:

  • The server runs Orchard, with the Datadog .NET tracer, and crashes about once or twice per day
  • The crash dump indicated an access violation in method clr!ObjectNative::IsLockHeld, itself called by Orchard.OutputCache.Filters.OutputCacheFilter.Dispose
  • In WinDbg, the !syncblk command failed with an error

Part 2 starts when, as I ran out of easy things to try, I…

This is a two parts article. Part two is available here.


To monitor the stability of the Datadog .NET tracer, we have a reliability environment where we continuously run mainstream applications such as Orchard. This story starts when, while preparing a release, we discovered that the latest version of our tracer was crashing the app with the message:

Application: w3wp.exe
Framework Version: v4.0.30319
Description: The process was terminated due to an internal error in the .NET Runtime at IP 00007FFB62CB0A8D (00007FFB625B0000) with exit code 80131506.

A look at the reliability monitor showed that the app was crashing once or twice…

In this series of article, we’re retracing how I debugged an InvalidProgramException, caused by a bug in the Datadog profiler, from a memory dump sent by a customer.

In the previous part, we’ve located the bad generated IL, stored in an internal CLR structure. The third part is going to be about understanding what’s wrong with the IL, and finding the root cause.

Identifying the error

There are a lot of things that can be wrong in some IL code, so I wasn’t short…

In this series of article, we’re retracing how I debugged an InvalidProgramException, caused by a bug in the Datadog profiler, from a memory dump sent by a customer.

Let’s start with a quick reminder. The profiler works by rewriting the IL of interesting methods to inject instrumentation code. The InvalidProgramException is thrown by the JIT when trying to compile the IL emitted by the profiler, which must be somehow invalid. The first part was about identifying in what method the exception…

Datadog automated instrumentation for .NET works by rewriting the IL of interesting methods to emit traces that are then sent to the back-end. This is a complex piece of logic, written using the profiler API, and ridden with corner-cases. And as always with complex code, bugs are bound to happen, and those can be very difficult to diagnose.

As it turns out, we had customer reports of applications throwing InvalidProgramException when using our instrumentation. This exception is thrown when the JIT encounters invalid IL code, most likely emitted by our profiler. The symptoms were always the same: upon starting, the…

Kevin Gosse

Software developer passionate about .NET, performance, and debugging

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store