The crash

This one started when trying to understand why an integration test was failing, only on Linux with ARM64.

As I had no ARM64 dev environment available, I first tried adding more and more traces and let the test run in the CI, without much success.

Eventually, I realized this was leading nowhere, and took the time to setup an ARM64 VM to investigate further. …


This story begins when one of our integrations tests started got stuck on one PR that seemingly impacted unrelated code. This is a nice excuse to cover some concepts I haven’t touched in my previous articles, such as downloading the .NET symbols on Linux.

Preliminary inspection

The failure was occurring in a Linux test. After a while, I managed to reproduce the issue locally in a docker container. Usually the first step would be to attach a debugger, but I didn’t want to spend the time to find the commands to inject the debugger in the container. …


This is the second part of an investigation where I tried to understand why an application was randomly crashing with an AccessViolationException.

If you haven’t read it, you can find part 1 of the investigation here.

As a reminder, here is what we uncovered so far:

  • The server runs Orchard, with the Datadog .NET tracer, and crashes about once or twice per day
  • The crash dump indicated an access violation in method clr!ObjectNative::IsLockHeld, itself called by Orchard.OutputCache.Filters.OutputCacheFilter.Dispose
  • In WinDbg, the !syncblk command failed with an error

Part 2 starts when, as I ran out of easy things to try, I…


This is a two parts article. Part two is available here.

Symptoms

To monitor the stability of the Datadog .NET tracer, we have a reliability environment where we continuously run mainstream applications such as Orchard. This story starts when, while preparing a release, we discovered that the latest version of our tracer was crashing the app with the message:

Application: w3wp.exe
Framework Version: v4.0.30319
Description: The process was terminated due to an internal error in the .NET Runtime at IP 00007FFB62CB0A8D (00007FFB625B0000) with exit code 80131506.

A look at the reliability monitor showed that the app was crashing once or twice…


In this series of article, we’re retracing how I debugged an InvalidProgramException, caused by a bug in the Datadog profiler, from a memory dump sent by a customer.

In the previous part, we’ve located the bad generated IL, stored in an internal CLR structure. The third part is going to be about understanding what’s wrong with the IL, and finding the root cause.

Identifying the error

There are a lot of things that can be wrong in some IL code, so I wasn’t short…


In this series of article, we’re retracing how I debugged an InvalidProgramException, caused by a bug in the Datadog profiler, from a memory dump sent by a customer.

Let’s start with a quick reminder. The profiler works by rewriting the IL of interesting methods to inject instrumentation code. The InvalidProgramException is thrown by the JIT when trying to compile the IL emitted by the profiler, which must be somehow invalid. The first part was about identifying in what method the exception…


Datadog automated instrumentation for .NET works by rewriting the IL of interesting methods to emit traces that are then sent to the back-end. This is a complex piece of logic, written using the profiler API, and ridden with corner-cases. And as always with complex code, bugs are bound to happen, and those can be very difficult to diagnose.

As it turns out, we had customer reports of applications throwing InvalidProgramException when using our instrumentation. This exception is thrown when the JIT encounters invalid IL code, most likely emitted by our profiler. The symptoms were always the same: upon starting, the…


The Datadog .NET tracer uses the HTTP protocol to communicate with the agent (usually installed on the same machine) and send traces every second. We originally used HttpClient to handle the communication. However, we ran into some dependencies errors on .NET Framework (it’s a bit complicated and outside of the scope of the article, but it has something to do with us instrumenting the System.Net.HttpClient assembly while at the same time having a dependency on it), and so we decided to switch to HttpWebRequest. …


.NET core startup hooks is a feature I really like, and I had a lot of fun with it in the past. Still, I had yet to find a legitimate use for them, and the opportunity finally came a few days ago.

What are startup hooks?

Let’s start by a quick catch-up, for those who don’t know what startup hooks are. The feature was introduced with .net core 2.2, and allows to execute any arbitrary code in a .net process before the Main entry point has a chance to run. This is done by declaring a DOTNET_STARTUP_HOOKS environment variable, pointing to the assembly you…


… and find an unexpected bug in the process

We recently run into performance issues, and many elements seemed to point towards the GC verbose events being enabled. To test this, I needed a way to check on live servers whether the events were activated.

Note: the whole article is about .net core on Linux. While the first part (implementation) can be transposed to any OS, I’m not sure the second part could be done on Windows without installing additional tools.

Checking the implementation

The first step is to understand how the runtime decides whether those events are enabled or not. …

Kevin Gosse

Software developer passionate about .NET, performance, and debugging

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store