[C#] Using GC.KeepAlive in async methods
Can you figure out what’s wrong with this code?
It turns out that when running in an actual application, there is a probability that myDelegate
will get collected despite the call to GC.KeepAlive
.
This code was posted in an issue in the dotnet/runtime repository, and I followed it closely because I really couldn’t tell what the error was. If you can’t either, then stick around and let’s figure out what’s happening.
GC.KeepAlive
First, let’s remind what is GC.KeepAlive
and why it’s needed here.
As you probably already know, the GC is responsible for freeing memory when objects are not used anymore. It does so by tracking references: if there are no reachable references to a given object, then it can be cleaned away.
The GC is smart enough to know the exact point in a method where an object stops being referenced. For instance:
As soon as DoStuffWithObj
is called, and assuming that method doesn’t store a reference to obj
somewhere, the object may be garbage collected.
Note: If you want to test this behavior, make sure to run your code in Release mode, without a debugger. In Debug mode, the lifetime of the instances is extended until the end of the scope, to make debugging easier.
Normally, this is not something developers have to worry about. But things get trickier when you start calling native methods. Managed methods are decorated with special metadata that allows the GC to track the lifetime of the local variables. There is no such thing in native methods, so by default the GC considers that the native code does not hold any reference to managed code (doing the opposite could lead to memory leaks). From an object lifetime standpoint, calling a native method is effectively the same as calling a method with a weak reference:
So what happens if the garbage collector decides to run while the native method is executing?
As we can see here, the GC might collect the object while the native method is running, which would probably cause an access violation if the native code then tries to use it.
Let’s go on a little tangent and talk about why I needed to add [MethodImpl(MethodImplOptions.AggressiveOptimization)]
to this example.
Since around .NET Core 3, the tiered JIT is enabled by default. Because of it, the first 30 invocations of a method use a less optimized version of the code. Incidentally, that less optimized version is not as aggressive at tracking the lifetime of objects, and the lifetime of obj
ends up being extended until the end of the method, ruining the example. [MethodImpl(MethodImplOptions.AggressiveOptimization)]
forces the tiered JIT to emit an optimized version of the method during the first invocation.
Here’s what happens if I remove it and run the example multiple times:
Neat uh? The tiered JIT has been around for a few years but my mind is still blown away by the fact that a method can suddenly change its behavior through the execution of a program.
So, when calling a native method and giving it a reference to a managed object, the GC might collect that object at any point in time. How do we prevent that? There are multiple ways, and one of them is GC.KeepAlive
.
The most counter-intuitive fact about GC.KeepAlive
is that you don’t put it before the place where you need it but after:
This is because GC.KeepAlive
does not actually do anything. You can just think of it as a syntactic sugar to insert a reference to an object. We could achieve the exact same result with an empty method:
But we have to add the [MethodImpl(MethodImplOptions.NoInlining)]
attribute to prevent the JIT from optimizing away the empty method, which would remove our fake reference. At least with GC.KeepAlive
you don’t have to worry about this kind of stuff.
GC.KeepAlive in async methods
Now that we know what is GC.KeepAlive
and how to use it, let’s go back to the code that prompted this article:
The author of this code is trying to track the completion of an asynchronous native method. To do so, they create a delegate that will complete a TaskCompletionSource
, then give that delegate to the native code. Assumedly, the native code will call that delegate at the end of the asynchronous operation, thus completing the TCS.
Because the GC can’t track the lifetime of myDelegate
through the invocation of NativeMethods.MyNativeMethod
, the author of this code added a call to GC.KeepAlive(myDelegate)
at the end of the method. Yet, the application crashed with an access violation. What’s going on?
Let’s start by making sure we can reproduce the issue. Just like before, we’re going to use a weak reference to simulate the behavior of the native call:
And indeed, this code displays Is alive: False
, meaning that the delegate got collected despite the call to GC.KeepAlive
.
As hinted by the title of the article, the issue is that we’re using GC.KeepAlive
inside of an async method. If we rewrite the code to wait synchronously instead…
… then the code displays Is alive: True
as expected.
To explain this, we must dig a bit into how async methods work. At compilation time they are rewritten into a state machine. Our previous example would look like this (simplified) :
Suddenly, it becomes apparent that GC.KeepAlive
is not going to work as intended. It is on a different branch than the call to MyNativeMethod
, and it’s just going to reference an object that the state machine is already referencing anyway. But then why isn’t our state machine keeping myDelegate
alive?
To understand that, let’s walk the reference chain. myDelegate
is referenced by the async state machine. The async state machine is referenced by the task continuation (implicitly created when calling await taskCompletionSource.Task;
). The task continuation is referenced by the TaskCompletionSource
. And the TaskCompletionSource
is referenced by… nobody once the method exits.
Put in other words, when calling a synchronous method, the caller is responsible for continuing the workflow once the callee returns. When calling an asynchronous method, the responsibility is inverted: the caller returns, and the callee must call the continuation once the asynchronous operation is completed. But since the callee is a native method, it is not capable of keeping the managed reference alive.
The fix here is to allocate a GC handle to keep the reference alive:
You can think of the GC handle as a special reference that will always stay alive until Free()
is called. The GC handle is going to keep the delegate alive, which references the TaskCompletionSource
, which references the task continuation, which references the async state machine. And so the object graph is preserved until the end of the asynchronous operation.
Is this a bug?
Honestly, I’m conflicted. The behavior makes perfect sense once you look at what’s happening under the hood. But even with a good knowledge of how async
works in C#, I’m fairly sure that I would have made the same mistake if I had to write this code. In fact, it’s only after Stephen Toub commented in the github issue that I realized why the code was failing. On the other hand, adding special support for GC.KeepAlive
in async methods would require a significant amount of work for something that is rarely used, so it’s probably not worth the trouble. Maybe adding a warning when GC.KeepAlive
is used in an async method could be a good middle ground.