内容简介:Here I'm going to discuss the mechanisms and concepts relating to async iterators in C# - with the hope of both demystifying them a bit, and also showing how we can use some of the more advanced (but slightly hidden) features. I'm going to give some illust
Here I'm going to discuss the mechanisms and concepts relating to async iterators in C# - with the hope of both demystifying them a bit, and also showing how we can use some of the more advanced (but slightly hidden) features. I'm going to give some illustrations of what happens under the hood, but note: these are illustrations , not the literal generated expansion - this is deliberately to help show what is conceptually happening, so if I ignore some sublte implementation detail: that's not accidental. As always, if you want to see the actual code, tools like https://sharplab.io/ are awesome (just change the "Results" view to "C#" and paste the code you're interested in onto the left).
Iterators in the sync world
Before we discuss async iterators, let's start by recapping iterators. Many folks may already be familiar with all of this, but hey: it helps to set the scene. More importantly, it is useful to allow us to compare and contrast later when we look at how async
changes things. So: we know that we can write a foreach loop (over a sequence) of the form:
foreach (var item in SomeSource(42)) { Console.WriteLine(item); }
and for each item that SomeSource
returns, we'll get a line in the console. SomeSource
could
be returning a fully buffered set of data (like a List<string>
):
IEnumerable<string> SomeSource(int x) { var list = new List<string>(); for (int i = 0; i < 5; i++) list.Add($"result from SomeSource, x={x}, result {i}"); return list; }
but a problem here is that this requires SomeSource
to run to completion
before we get even the first result, which could take a lot of time and memory - and is just generally restrictive. Often, when we're trying to represent a sequence
, it may be unbounded, or at least: open-ended - for example, we could be pulling data from a remote work queue, where a: we only want to be holding one pending item at a time, and b: it may not have
a logical "end". It turns out that C#'s definition of a "sequence" (for the purposes of foreach
) is fine with this. Instead of returning
a list, we can write an iterator block
:
IEnumerable<string> SomeSource(int x) { for (int i = 0; i < 5; i++) yield return $"result from SomeSource, x={x}, result {i}"; }
This works similarly
, but there are some fundamental differences - most noticeably: we don't ever have a buffer - we just make one element available at a time. To understand how this can work, it is useful to take another look at our foreach
; the compiler interprets foreach
as something like
the following:
using (var iter = SomeSource(42).GetEnumerator()) { while (iter.MoveNext()) { var item = iter.Current; Console.WriteLine(item); } }
We have to be a little
loose in our phrasing here, because foreach
isn't actually tied to IEnumerable<T>
- it is duck-typed against an API shape instead; the using
may or may not be there, for example. But fundamentally, the compiler calls GetEnumerator()
on the expression passed to foreach
, then creates a while
loop checking MoveNext()
(which defines "is there more data?" and advances the mechanism in the success case), then accesses the Current
property (which exposes the element we advanced to). As an aside, historically (prior to C# 5) the compiler used to scope item
outside
of the while
loop, which might sound innocent, but it was the source of absolutely no end
of confusion, code erros, and questions on Stack Overflow (think "captured variables").
So; hopefully you can see in the above how the consumer
can access an unbounded forwards-only sequence via this MoveNext()
/ Current
approach; but how does that get implemented
? Iterator blocks (anything involving the yield
keyword) are actually incredibly
complex, so I'm going to take a lot of liberties here, but what is going on is similar
to:
IEnumerable<string> SomeSource(int x) => new GeneratedEnumerable(x); class GeneratedEnumerable : IEnumerable<string> { private int x; public GeneratedEnumerable(int x) => this.x = x; public IEnumerator<string> GetEnumerator() => new GeneratedEnumerator(x); // non-generic fallback IEnumerator IEnumerable.GetEnumerator() => GetEnumerator(); } class GeneratedEnumerator : IEnumerator<string> { private int x, i; public GeneratedEnumerator(int x) => this.x = x; public string Current { get; private set; } // non-generic fallback object IEnumerator.Current => Current; // if we had "finally" code, it would go here public void Dispose() { } // our "advance" logic public bool MoveNext() { if (i < 5) { Current = $"result from SomeSource, x={x}, result {i}"; i++; return true; } else { return false; } } // this API is essentially deprecated and never used void IEnumerator.Reset() => throw new NotSupportedException(); }
Let's tear this apart:
-
firstly, we need some object
to represent
IEnumerable<T>
, but we also need to understand thatIEnumerable<T>
andIEnumerator<T>
(as returned fromGetEnumerator()
) are different APIs; in the generated version there is a lot of overlap and they can share an instance, but to help discuss it, I've kept the two concepts separate. -
when we call
SomeSource
, we create ourGeneratedEnumerable
which stores the state (x
) that was passed toSomeSource
, and exposes the requiredIEnumerable<T>
API -
later (and it could be much
later), when the caller iterates (
foreach
) the data,GetEnumerator()
is invoked, which calls into ourGeneratedEnumerator
to act as the cursor over the data -
our
MoveNext()
logic implements the samefor
loop conceptually , but one step per call toMoveNext()
; if there is more data,Current
is assigned with the thing we would have passed toyield return
-
note that there is also a
yield break
C# keyword, which terminates iteration; this would essentially bereturn false
in the generated expansion -
note that there are some nuanced differences in my hand-written version that the C# compiler needs to deal with; for example, what happens if I change
x
in my enumerator code (MoveNext()
), and then later iterate the data a second time - what is the value ofx
? emphasis: I don't care about this nuance for this discussion!
Hopefully this gives enough of a flavor to understand foreach
and iterators ( yield
) - now let's get onto the more interesting bit: async
.
Why do we need async iterators?
The above works great in a synchronous world, but a lot
of .NET work is now favoring async
/ await
, in particular to improve server scalability. The big problem in the above code is the bool MoveNext()
. This is explicitly synchronous
. If the thing it is doing takes some time, we'll be blocking a thread, and blocking a thread is increasingly anathema to us. In the context of our earlier "remote work queue" example, there might not be anything there for seconds, minutes, hours. We really don't want to block threads for that kind of time! The closest we can do without async iterators is to fetch the data asynchronously, but buffered - for example:
async Task<List<string>> SomeSource(int x) {...}
But this is not the same semantics
- and is getting back into buffering. Assuming we don't want to fetch everything in one go, to get around this we'd eventually end up implementing some kind of "async batch loop" monstrosity that effectily re-implements foreach
using manual ugly code, negating the reasons that foreach
even exists. To address this, C# and the BCL have recently added support for async iterators, yay! The new APIs (which are available down to net461 and netstandard20 via NuGet
) are:
public interface IAsyncEnumerable<out T> { IAsyncEnumerator<T> GetAsyncEnumerator(CancellationToken cancellationToken = default); } public interface IAsyncEnumerator<out T> : IAsyncDisposable { T Current { get; } ValueTask<bool> MoveNextAsync(); } public interface IAsyncDisposable { ValueTask DisposeAsync(); }
Let's look at our example again, this time: with added async; we'll look at the consumer
first (the code doing the foreach
), so for now, let's imagine that we have:
IAsyncEnumerable<string> SomeSourceAsync(int x) => throw new NotImplementedException();
and focus on the loop; C# now has the await foreach
concept, so we can do:
await foreach (var item in SomeSourceAsync(42)) { Console.WriteLine(item); }
and the compiler interprets this as something similar to:
await using (var iter = SomeSourceAsync(42).GetAsyncEnumerator()) { while (await iter.MoveNextAsync()) { var item = iter.Current; Console.WriteLine(item); } }
(note that await using
is similar to using
, but DisposeAsync()
is called and awaited, instead of Dispose()
- even cleanup code can be asynchronous!)
The key point here is that this is actually pretty similar to our sync version, just with added await
. Ultimately, however, the moment we add await
the entire body is ripped apart by the compiler and rewritten as an asynchronous state machine. That isn't the topic of this article, so I'm not even going to try
and cover how await
is implemented behind the scenes. For today "a miracle happens" will suffice for that. The observant might also be wondering "wait, but what about cancellation?" - don't worry, we'll get there!
So what about our enumerator? Along with await foreach
, we can also
now write async iterators with yield
; for example, we could do:
async IAsyncEnumerable<string> SomeSourceAsync(int x) { for (int i = 0; i < 5; i++) { await Task.Delay(100); // simulate async something yield return $"result from SomeSource, x={x}, result {i}"; } }
In real code, we could now be consuming data from a remote source asynchronously, and we have a very
effective mechanism for expressing open sequences of asynchronous
data. In particular, remember that the await iter.MoveNextAsync()
might complete synchronously
, so if data is
available immediately, there is no context switch. We can imagine, for example, an iterator block that requests data from a remote server in pages
, and yield return
each record of the data in the current page (making it available immediately), only doing an await
when it needs to fetch the next page.
Behind the scenes, the compiler generates types to implement the IAsyncEnumerable<T>
and IAsyncEnumerator<T>
pieces, but this time they are even more obtuse
, owing to the async
/ await
restructuring. I do not
intend to try and cover those here - it is my hope instead that we wave a hand and say "you know that expansion we wrote by hand earlier? like that, but with more async". However, there is a very important topic that we have
overlooked, and that we should cover: cancellation.
But what about cancellation?
Most async APIs support cancellation via a CancellationToken
, and this is no exception; look back up to IAsyncEnumerable<T>
and you'll see that it can be passed into the GetAsyncEnumerator()
method. But if we're not writing the loop by hand, how do we do this? This is achieved via WithCancellation
, similarly do how ConfigureAwait
can be used to configure await
- and indeed, there's even a ConfigureAwait
we can use too! For example, we could do (showing both config options in action here):
await foreach (var item in SomeSourceAsync(42) .WithCancellation(cancellationToken).ConfigureAwait(false)) { Console.WriteLine(item); }
which would be semantically equivalent to:
var iter = SomeSourceAsync(42).GetAsyncEnumerator(cancellationToken); await using (iter.ConfigureAwait(false)) { while (await iter.MoveNextAsync().ConfigureAwait(false)) { var item = iter.Current; Console.WriteLine(item); } }
(I've had to split the iter
local out to illustrate that the ConfigureAwait
applies to the DisposeAsync()
too - via await iter.DisposeAsync().ConfigureAwait(false)
in a finally
)
So; now we can pass a CancellationToken
into
our iterator... but - how can we use it? That's where things get even more
fun! The naive
way to do this would be to think along the lines of "I can't take a CancellationToken
until GetAsyncEnumerator
is called, so... perhaps I can create a type to hold the state until I get to that point, and create an iterator block on the GetAsyncEnumerator
method" - something like:
// this is unnecessary; do not copy this! IAsyncEnumerable<string> SomeSourceAsync(int x) => new SomeSourceEnumerable(x); class SomeSourceEnumerable : IAsyncEnumerable<string> { private int x; public SomeSourceEnumerable(int x) => this.x = x; public async IAsyncEnumerator<string> GetAsyncEnumerator( CancellationToken cancellationToken = default) { for (int i = 0; i < 5; i++) { await Task.Delay(100, cancellationToken); // simulate async something yield return $"result from SomeSource, x={x}, result {i}"; } } }
The above works
. If a CancellationToken
is passed in via WithCancellation
, our iterator will be cancelled at the correct time - including during the Task.Delay
; we could also check IsCancellationRequested
or call ThrowIfCancellationRequested()
at any point in our iterator block, and all the right things would happen. But; we're making life hard for ourselves - the compiler can do this for us
, via [EnumeratorCancellation]
. We could also
just have:
async IAsyncEnumerable<string> SomeSourceAsync(int x, [EnumeratorCancellation] CancellationToken cancellationToken = default) { for (int i = 0; i < 5; i++) { await Task.Delay(100, cancellationToken); // simulate async something yield return $"result from SomeSource, x={x}, result {i}"; } }
This works similarly
to our approach above - our cancellationToken
parameter makes the token from GetAsyncEnumerator()
(via WithCancellation
) available to our iterator block, and we haven't had to create any dummy types. There is one slight nuance, though... we've changed the signature
of SomeSourceAsync
by adding a parameter. The code we had above still compiles
because the parameter is optional. But this prompts the question: what happens if I passed one in
? For example, what are the differences between:
// option A - no cancellation await foreach (var item in SomeSourceAsync(42)) // option B - cancellation via WithCancellation await foreach (var item in SomeSourceAsync(42).WithCancellation(cancellationToken)) // option C - cancellation via SomeSourceAsync await foreach (var item in SomeSourceAsync(42, cancellationToken)) // option D - cancellation via both await foreach (var item in SomeSourceAsync(42, cancellationToken).WithCancellation(cancellationToken)) // option E - cancellation via both with different tokens await foreach (var item in SomeSourceAsync(42, tokenA).WithCancellation(tokenB))
The answer is that the right thing happens
: it doesn't matter which API you use - if a cancellation token is provided, it will be respected. If you pass two different
tokens, then when either
token is cancelled, it will be considered cancelled. What happens is that the original token passed via the parameter
is stored as a field on the generated enumerable type, and when GetAsyncEnumerator
is called, the parameter to GetAsyncEnumerator
and the field are inspected. If they are both genuine but different cancellable tokens, CancellationTokenSource.CreateLinkedTokenSource
is used to create a combined token (you can think of CreateLinkedTokenSource
as the cancellation version of Task.WhenAny
); otherwise, if either
is genuine and cancellable, it is used. The result is that when you write an async cancellable iterator, you don't need to worry too much about whether the caller used the API directly vs indirectly.
You might be more concerned by the fact that we've changed the signature, however; in that case, a neat trick is to use two methods - one without the token that is for consumers, and one with the token for the actual implementation:
public IAsyncEnumerable<string> SomeSourceAsync(int x) => SomeSourceImplAsync(x); private async IAsyncEnumerable<string> SomeSourceImplAsync(int x, [EnumeratorCancellation] CancellationToken cancellationToken = default) { for (int i = 0; i < 5; i++) { await Task.Delay(100, cancellationToken); // simulate async something yield return $"result from SomeSource, x={x}, result {i}"; } }
This would seem an ideal candidate for a "local function", but unfortunately at the current time , parameters on local functions are not allowed to be decorated with attributes. It is my hope that the language / compiler folks take pity on us, and allow us to do (in the future) something more like:
public IAsyncEnumerable<string> SomeSourceAsync(int x) { return Impl(); // this does not compile today async IAsyncEnumerable<string> Impl( [EnumeratorCancellation] CancellationToken cancellationToken = default) { for (int i = 0; i < 5; i++) { await Task.Delay(100, cancellationToken); // simulate async something yield return $"result from SomeSource, x={x}, result {i}"; } } }
or the equivalent using static
local functions, which is usually
my preference to avoid any surprises in how capture works. The good news is that this works in the preview language versions, but that is not a guarantee that it will "land".
Summary
So; that's how you can implement and use async iterators in C# now. We've looked at both the consumer and producer versions of iterators, for both synchronous and asynchronous code paths, and looked at various ways of accessing cancellation of asynchronous iterators. There is a lot going on here, but: hopefully it is useful and meaningful.
以上所述就是小编给大家介绍的《The anatomy of async iterators (aka await, foreach, yield)》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。