This is the second part of a two-part series. For the first post, see Iterator blocks and the "yield return" statement - #1.
IEnumerator
Before we jump into the nasty world of state machines and wotnot, let's start by re-familiarizing ourselves with theIEnumerator
interface:
public interface IEnumerator { bool MoveNext(); object Current { get; } void Reset(); }The important things here are
MoveNext()
and Current
. The MoveNext()
method moves to the next item in the set and returns a boolean indicating if there are any more items left in the set, and the Current
property allows us to get the current item. This is all we need to support the foreach loop - it simply calls MoveNext()
until it returns false, and uses the Current
property for get the value for the loop variable.
IEnumerable and Iterator Blocks
Consider the following code from the previous post:public static IEnumerable Workdays() { yield return "Monday"; yield return "Tuesday"; yield return "Wednesday"; yield return "Thursday"; yield return "Friday"; } static void Main() { foreach (string day in Workdays()) { Console.WriteLine(day); } }The
foreach
statement only operates over IEnumerable
objects - so clearly the Workdays() method must be returning an IEnumerable
instance (as its method signature suggests). So how how does it do that? We need to peek inside the compiled code with Reflector to find out the answer.
Inside the compiled code
Using Reflector, we are able to decompile the generated IL and get an approximate view of what the C# to create it would be. So, navigating to our type, we see that the Workdays() method is actually compiled into:public static IEnumerable Workdays() { return new <Workdays>d__0(-2); }What the... ? That is not the code we wrote. And what is this strangely named type,
"<Workdays>d__0"
?
State machines
To answer those questions, we need to take a step backwards, and look at something computer scientists call a state machine.A state machine is a structure in which execution has a defined state, and from that state, it has a discrete list of subsequent states that it may transition into. A simple example would be to consider the road network to be a state machine - and the position of your car on the roads to be the state. If your car pulls up to a T-junction, you may turn either left or right - you may not jump somewhere two miles away onto a different road. Therefore the list of states that you can transition to is limited to a finite set, left and right. A state machine in computing works on the same principle - the program that is being executed has a definite state, and the state machine represents the current state, and the list of possible states that it can transition to next.
So how does this relate to our simple
yield return
example? Lets have another look in Reflector at the generated code, and specifically that strange <Workdays>d__0
type.
Looking closer
The class it generates is quite large and scary looking - I have stripped down to the relevant methods:private sealed class <Workdays>d__0 : IEnumerable<object>, IEnumerable, IEnumerator<object>, IEnumerator, IDisposable { private int <>1__state; private object <>2__current; public <Workdays>d__0(int <>1__state) { this.<>1__state = <>1__state; } private bool MoveNext() { switch (this.<>1__state) { case 0: this.<>1__state = -1; this.<>2__current = "Monday"; this.<>1__state = 1; return true; case 1: this.<>1__state = -1; this.<>2__current = "Tuesday"; this.<>1__state = 2; return true; case 2: this.<>1__state = -1; this.<>2__current = "Wednesday"; this.<>1__state = 3; return true; case 3: this.<>1__state = -1; this.<>2__current = "Thursday"; this.<>1__state = 4; return true; case 4: this.<>1__state = -1; this.<>2__current = "Friday"; this.<>1__state = 5; return true; case 5: this.<>1__state = -1; break; } return false; } object IEnumerator.Current { get { return this.<>2__current; } } }The first thing we notice from this is that all the identifier names are mangled and that makes the code hard to follow. So, I've cleaned up the source further with meaningful identifier names:
private sealed class WorkdaysEnumerator : IEnumerable<object>, IEnumerable, IEnumerator<object>, IEnumerator, IDisposable { private int m_State; private object m_Current; public WorkdaysEnumerator(int initialState) { this.m_State = initialState; } private bool MoveNext() { switch (this.m_State) { case 0: this.m_State = -1; this.m_Current = "Monday"; this.m_State = 1; return true; case 1: this.m_State = -1; this.m_Current = "Tuesday"; this.m_State = 2; return true; case 2: this.m_State = -1; this.m_Current = "Wednesday"; this.m_State = 3; return true; case 3: this.m_State = -1; this.m_Current = "Thursday"; this.m_State = 4; return true; case 4: this.m_State = -1; this.m_Current = "Friday"; this.m_State = 5; return true; case 5: this.m_State = -1; break; } return false; } object IEnumerator.Current { get { return this.m_Current; } } }That's a bit clearer - we can see the all important
MoveNext()
and Current
members. So how does it work?
Stepping into the State Machine
The code above is a simple implementation of a state machine. The current state is an integer,m_State
in the cleaned up code, and it holds a value indicating what state the machine is currently in - in this example, it has five valid states, one per workday. The switch
statement is the guts of the machine, that reads the state value and determines what do do next. Let's pick apart one branch of the switch statement:
case 3: this.m_State = -1; this.m_Current = "Thursday"; this.m_State = 4; return true;This branch is only executed when the state is equal to 3, which is set when it leaves the "Wednesday" branch. It sets the
Current
property to be "Thursday", and increases the state variable by one, making it ready for the next call to MoveNext()
(which would set Current
to "Friday").
Let's also have a closer look at the sequence of calls that are generated when a programmer writes a
foreach
loop over this iterator.
foreach (string day in Workdays()) { Console.WriteLine(day); }This code actually compiles into something that looks like the below:
string day; IEnumerable enumerator = new WorkdayEnumerator(0); while (enumerator.MoveNext()) { day = enumerator.Current; Console.Writeline(day); }I've simplified the above a touch for clarity. Let's run through the sequence of calls that this loop would make and explain what happens to the state machine.
Step | Value of m_State (before) | Value of m_State (after) | Value of Current | MoveNext() returned |
1 | 0 | 1 | "Monday" | true |
2 | 1 | 2 | "Tuesday" | true |
3 | 2 | 3 | "Wednesday" | true |
4 | 3 | 4 | "Thursday" | true |
5 | 4 | 5 | "Friday" | true |
6 | 5 | -1 | "Friday" | false |
As you can see, when it hits the last valid state ("Friday"), it still makes one more call to
MoveNext()
- but since it has set the m_State
value to 5, which is not a valid state, MoveNext()
returns false and the loop exits.
Wrapping up and further reading
yield return
allows us to write simple code to generate lists from complex structures without having to implement our own state machines. It also has some nice uses that are not immediately apparent - since an iterator can return an infinite series of values it can be put to use in many different contexts. Hopefully from this series of posts you have learned how to use yield return, and understand some of the complexity of the code that is generated behind the scenes. Go forth and iterate.
For more information, you might want to read the following.
0 comments:
Post a Comment