I am currently looking for a contract in the London area -

If you're hiring a .NET contractor in or around London, look no further!

Iterator blocks and the "yield return" statement - #2


This is the second part of a two-part series. For the first post, see Iterator blocks and the "yield return" statement - #1.

IEnumerator

Before we jump into the nasty world of state machines and wotnot, let's start by re-familiarizing ourselves with the IEnumerator interface:
public interface IEnumerator
{
    bool MoveNext();
    object Current { get; }
    void Reset();
}
The important things here are MoveNext() and Current. The MoveNext() method moves to the next item in the set and returns a boolean indicating if there are any more items left in the set, and the Current property allows us to get the current item. This is all we need to support the foreach loop - it simply calls MoveNext() until it returns false, and uses the Current property for get the value for the loop variable.

IEnumerable and Iterator Blocks

Consider the following code from the previous post:
public static IEnumerable Workdays()
{
 yield return "Monday";
 yield return "Tuesday";
 yield return "Wednesday";
 yield return "Thursday";
 yield return "Friday";
}

static void Main()
{
 foreach (string day in Workdays())
 {
  Console.WriteLine(day);
 }
}
The foreach statement only operates over IEnumerable objects - so clearly the Workdays() method must be returning an IEnumerable instance (as its method signature suggests). So how how does it do that? We need to peek inside the compiled code with Reflector to find out the answer.

Inside the compiled code

Using Reflector, we are able to decompile the generated IL and get an approximate view of what the C# to create it would be. So, navigating to our type, we see that the Workdays() method is actually compiled into:
public static IEnumerable Workdays()
{
    return new <Workdays>d__0(-2);
}
What the... ? That is not the code we wrote. And what is this strangely named type, "<Workdays>d__0"?

State machines

To answer those questions, we need to take a step backwards, and look at something computer scientists call a state machine.

A state machine is a structure in which execution has a defined state, and from that state, it has a discrete list of subsequent states that it may transition into. A simple example would be to consider the road network to be a state machine - and the position of your car on the roads to be the state. If your car pulls up to a T-junction, you may turn either left or right - you may not jump somewhere two miles away onto a different road. Therefore the list of states that you can transition to is limited to a finite set, left and right. A state machine in computing works on the same principle - the program that is being executed has a definite state, and the state machine represents the current state, and the list of possible states that it can transition to next.

So how does this relate to our simple yield return example? Lets have another look in Reflector at the generated code, and specifically that strange <Workdays>d__0 type.

Looking closer

The class it generates is quite large and scary looking - I have stripped down to the relevant methods:
private sealed class <Workdays>d__0 : IEnumerable<object>, IEnumerable, IEnumerator<object>, IEnumerator, IDisposable
{
    private int <>1__state;
    private object <>2__current;

    public <Workdays>d__0(int <>1__state)
    {
        this.<>1__state = <>1__state;
    }

    private bool MoveNext()
    {
        switch (this.<>1__state)
        {
            case 0:
                this.<>1__state = -1;
                this.<>2__current = "Monday";
                this.<>1__state = 1;
                return true;

            case 1:
                this.<>1__state = -1;
                this.<>2__current = "Tuesday";
                this.<>1__state = 2;
                return true;

            case 2:
                this.<>1__state = -1;
                this.<>2__current = "Wednesday";
                this.<>1__state = 3;
                return true;

            case 3:
                this.<>1__state = -1;
                this.<>2__current = "Thursday";
                this.<>1__state = 4;
                return true;

            case 4:
                this.<>1__state = -1;
                this.<>2__current = "Friday";
                this.<>1__state = 5;
                return true;

            case 5:
                this.<>1__state = -1;
                break;
        }
        return false;
    }

    object IEnumerator.Current
    {
        get
        {
            return this.<>2__current;
        }
    }
}
The first thing we notice from this is that all the identifier names are mangled and that makes the code hard to follow. So, I've cleaned up the source further with meaningful identifier names:
private sealed class WorkdaysEnumerator : IEnumerable<object>, IEnumerable, IEnumerator<object>, IEnumerator, IDisposable
{
    private int m_State;
    private object m_Current;

    public WorkdaysEnumerator(int initialState)
    {
        this.m_State = initialState;
    }

    private bool MoveNext()
    {
        switch (this.m_State)
        {
            case 0:
                this.m_State = -1;
                this.m_Current = "Monday";
                this.m_State = 1;
                return true;

            case 1:
                this.m_State = -1;
                this.m_Current = "Tuesday";
                this.m_State = 2;
                return true;

            case 2:
                this.m_State = -1;
                this.m_Current = "Wednesday";
                this.m_State = 3;
                return true;

            case 3:
                this.m_State = -1;
                this.m_Current = "Thursday";
                this.m_State = 4;
                return true;

            case 4:
                this.m_State = -1;
                this.m_Current = "Friday";
                this.m_State = 5;
                return true;

            case 5:
                this.m_State = -1;
                break;
        }
        return false;
    }

    object IEnumerator.Current
    {
        get
        {
            return this.m_Current;
        }
    }
}
That's a bit clearer - we can see the all important MoveNext() and Current members. So how does it work?

Stepping into the State Machine

The code above is a simple implementation of a state machine. The current state is an integer, m_State in the cleaned up code, and it holds a value indicating what state the machine is currently in - in this example, it has five valid states, one per workday. The switch statement is the guts of the machine, that reads the state value and determines what do do next. Let's pick apart one branch of the switch statement:
case 3:
    this.m_State = -1;
    this.m_Current = "Thursday";
    this.m_State = 4;
    return true;
This branch is only executed when the state is equal to 3, which is set when it leaves the "Wednesday" branch. It sets the Current property to be "Thursday", and increases the state variable by one, making it ready for the next call to MoveNext() (which would set Current to "Friday").

Let's also have a closer look at the sequence of calls that are generated when a programmer writes a foreach loop over this iterator.
foreach (string day in Workdays())
{
 Console.WriteLine(day);
} 
This code actually compiles into something that looks like the below:
string day;
IEnumerable enumerator = new WorkdayEnumerator(0);

while (enumerator.MoveNext())
{
 day = enumerator.Current;
 Console.Writeline(day);
}
I've simplified the above a touch for clarity. Let's run through the sequence of calls that this loop would make and explain what happens to the state machine.

Step Value of m_State (before) Value of m_State (after) Value of Current MoveNext() returned
1 0 1 "Monday" true
2 1 2 "Tuesday" true
3 2 3 "Wednesday" true
4 3 4 "Thursday" true
5 4 5 "Friday" true
6 5 -1 "Friday" false


As you can see, when it hits the last valid state ("Friday"), it still makes one more call to MoveNext() - but since it has set the m_State value to 5, which is not a valid state, MoveNext() returns false and the loop exits.

Wrapping up and further reading

yield return allows us to write simple code to generate lists from complex structures without having to implement our own state machines. It also has some nice uses that are not immediately apparent - since an iterator can return an infinite series of values it can be put to use in many different contexts. Hopefully from this series of posts you have learned how to use yield return, and understand some of the complexity of the code that is generated behind the scenes. Go forth and iterate.

For more information, you might want to read the following.

Iterator blocks and the "yield return" statement - #1


This is the first part of a two-part series. For the second post, see Iterator blocks and the "yield return" statement - #2.

C# 2.0 introduced a wonderful and underused piece of syntax - the "yield return" statement. This is the first of two posts - in this one I will explain what yield return is and how to use it, and in the second of I'll explain how it actually works internally.

Iterators

An iterator in C# is a coding construct that allows you to loop over a set of values using the foreach loop, without having to implement the entire IEnumerator interface. If you've ever rolled your own collection classes that needed you to implement IEnumerator, you might know what pain it can sometimes be. In particular, it is difficult to implement on non-linear data structures, such as trees, without some nasty hacking.

C# 2.0 allows us to avoid implementing the full IEnumerator interface, by writing a special iterator method, using yield return to provide values back to the caller. Such a method is called an iterator block - a method that returns either IEnumerable or IEnumerable<> - which can be either an instance or a static method.

Yield return

The yield return statement is used to return a single value from the iterator block. A simple example is below (taken from the MSDN documentation)
public static IEnumerable Power(int number, int exponent)
{
    int counter = 0;
    int result = 1;
    while (counter++ < exponent)
    {
        result = result * number;
        yield return result;
    }
}
In this example, the yield return statement is executed once per loop, returning the next value in the power set for the given parameters. It doesn't need to be used in a loop however - the following is perfectly valid:
public static IEnumerable Workdays()
{
 yield return "Monday";
 yield return "Tuesday";
 yield return "Wednesday";
 yield return "Thursday";
 yield return "Friday";
}

static void Main()
{
 foreach (string day in Workdays())
 {
  Console.WriteLine(day);
 }
}


Limitations

You can't nest yield return statements inside a try.. catch block, it can't be unsafe code, and can't appear in an anonymous method or lambda expression. Also, the iterator block cannot take ref or out parameters. This is all due to the way yield return is implemented internally, which I will be covering in the next post of this series.