C# Coding Solutions—What Does Yield Keyword Generate

Microsoft .NET Framework, ASP.NET, Visual C# (CSharp, C Sharp, C-Sharp) Developer Training, Visual Studio


Jump to: navigation, search
CSharp-Online.NET:Articles
C# Articles

C# Coding Solutions

© 2006 Christian Gross

What Does the Yield Keyword Really Generate?

The yield keyword was added to C# 2.0 and is used to simplify the implementation of enumeration in custom classes. Before the yield keyword, we had to implement a number of interfaces to have a class support enumeration. Implementation of enumeration was a pain, yet we did it so that we could take advantage of the foreach looping mechanism. The foreach looping mechanism makes it easy to iterate a collection.

The yield keyword simplifies the implementation of iterable collections, but it also allows us to move beyond collections and into result sets. Using the yield keyword, we can convert calculated sequences into collections. Let me give an example. Let’s say that I am calculating the sequence of square roots for all numbers. Saying that you will calculate a sequence of numbers for all numbers should already indicate to you that a giant array would be calculated, as numbers are infinite.

Assuming for the moment that we do create an infinite array, let’s look at how those numbers would be generated without using the yield keyword. There would be a piece of code that would call the algorithm to generate the sequence of numbers. The sequence of numbers would be added to an array, which is returned to the calling code when the algorithm has completed. Yet we are calculating an infinite sequence of numbers, meaning that the algorithm will never end and the array will never be complete.

Of course, in reality, algorithms do end, and arrays do become complete. But the example illustrates that if you were to generate a collection that could be iterated, you must first generate the collection and then iterate the collection. This would mean you first allocate the space for an array and then fill the array, resulting in a not-as-efficient solution. The yield keyword is more efficient, because it allows a calculation to generate numbers on the fly, making it appear like there is a collection of precalculated numbers.

Consider the following example, which is an iterable collection of one item:

public class ExampleIterator : IEnumerable {
    public IEnumerator GetEnumerator() {
        yield return 1;
    }
}

The class ExampleIterator implements the IEnumerable interface, which requires the GetEnumerator method to be implemented. The GetEnumerator method returns an IEnumerator instance. In the implementation of GetEnumerator, the value 1 is returned rather than an IEnumerator interface instance. This is odd, because how can a value type be returned when a reference type is expected? The magic is the yield keyword, which provides the missing code in the form of generated IL.

The yield keyword is a compiler directive that generates a very large chunk of IL code. Using ILDASM.exe it is possible to reverse engineer what the compiler generated; Figure 4-1 shows an outline of the generated code.


Image:7443f0401.jpg
Figure 4-1. Generated IL code structure for the yield keyword


In Figure 4-1 the class ExampleIterator has an embedded class called <GetEnumerator>d__0. The naming of the embedded class is peculiar; it seems to indicate that the actual class name is d__0 and the <GetEnumerator> references a .NET Generics type. This is not the case, and the <GetEnumerator> identifier is indeed part of the class identifier. If you had tried using such an identifier in C# or VB.NET, there would have been a compiler error.

The oddly named class ensures that a programmer that uses the yield keyword will never define a class that conflicts with the generated class, and it does the heavy lifting of implementing the IEnumerator interface. Additionally, the class <GetEnumerator>d__0 has the associated attributes CompilerGenerated and sealed, making it impossible to subclass the type in the code. The yield keyword does not introduce a new feature in the .NET runtime, but generates all of the plumbing necessary to implement iterable sets.

The generated class contains the logic that was coded in the implementation of GetEnumerator and replaces it with the following:

public IEnumerator GetEnumerator() {
    ExampleIterator.<GetEnumerator > d__0 d
        __1 = new ExampleIterator.< GetEnumerator > d__0(0);
    d__1.<>4__this = this;
    return d__1;
}

The replaced code illustrates that when an IEnumerator instance is asked, it is returned, and the magic generated by the C# compiler is returned. The logic (yield return 1) is moved to the IEnumerator.MoveNext method, which is used to iterate the generated sequence of numbers. We are wondering how the magic code converts the yield into a sequence of numbers. The answer is that the magic code creates a sequence of numbers by using a state engine to mimic a collection of numbers.

To see how the statement yield return 1 is converted into something that foreach can use, look at the implementation of generated MoveNext. The generated method <GetEnumerator>d_0.MoveNext is implemented (The generated code has been converted from IL into C# using Lutz Roeder’s .NET Reflector.) as follows:

private bool MoveNext() {
    switch (this.<>1__state) {
    case 0:
        this.<>1__state = -1;
        this.<>2__current = 1;
        this.<>1__state = 1;
        return true;
 
    case 1:
        this.<>1__state = -1;
        break;
    }
    return false;
}

A stable table is generated, and when it’s called multiple times it will change state and do the appropriate action. Let’s go through the sequence of events: The foreach starts and calls the method MoveNext for the first time. The value of the data member this.<>1__state is 0, and is the state position. The switch statement will execute the case statement with the value 0.

The statement with the value 0 reassigns the state position to –1 to put the state position into an undetermined state in case the assignment of the state member causes an exception. If an exception occurs, you do not want the foreach loop constantly repeating itself and generating the same content or same error.

If the assignment of the state member (this.<>2__current) is successful, then the state position (this.<>1__state) is assigned a value of 1 indicating the value of the next state. With the state member assigned and the state position incremented, a true value can be returned. A true value indicates that the foreach loop can use the state member to assign the variable. The client code processes the variable and loops again.

The next loop causes a call to MoveNext again. This time the switch statement causes a branch to the state position of 1, which reassigns the state position to –1 and returns false. When MoveNext returns false, foreach will break out of its loop.

The yield statement has created a state table that mimics collection behavior. At a glance, the yield statement has ceased to be a simple programming construct, and has become an instruction used by a code generator. The yield statement is a code generator, because the generated IL could have been written using C# code. Lower-level type programming languages, such as Java, C#, C++, and C, have in the past taken the approach that the language is not extended, but that the libraries are extended to enhance the language.

Getting back to the yield statement, the following example illustrates how to use ExampleIterator to iterate the collection of one item:

[Test]
public void ExampleIterator() {
    foreach (int number in new ExampleIterator()) {
        Console.WriteLine("Found number (" + number + ")");
    }
}

In the example, foreach will loop once and display to the console the number 1.

Knowing that a state engine is created, we can look at a more complicated yield example that calls methods and other .NET API. Following is a more complicated yield example:

public class ExampleIterator : IEnumerable {
    int _param;
    private int Method1( int param) {
        return param + param;
    }
    private int Method2( int param) {
        return param * param;
    }
    public IEnumerator GetEnumerator() {
        Console.WriteLine("before");
        for (int c1 = 0; c1 < 10; c1 ++) {
            _param = 10 + c1;
            yield return Method1(_param);
            yield return Method2(_param);
        }
        Console.WriteLine("after");
    }
}

In this example, the yield example the GetEnumerator implementation calls the Console.WriteLine function at the beginning and the end of the method. The purpose of the two lines of code is to provide code boundaries that can be easily found in the MSIL. In the implementation of ExampleIterator, the variable _param is declared, and passed to Method1 and Method2, which return modified values of the variable param. These variable declarations and method calls, while trivial, mimic how you would write code that uses the yield statement.

The sequence of events from the perspective of written C# code would be as follows:

  1. Call GetEnumerator.
  2. Console.WriteLine generates text before.
  3. Start a loop that counts from 0 to 10.
  4. Assign the data member _param with a value of 10 plus the loop counter c1.
  5. Call Method1 with the _param value that will add the number to itself and return the number’s value.
  6. Return the number generated by Method1 to the foreach loop.
  7. The foreach loop calls GetEnumerator again.
  8. Call Method2 with the _param value that will multiply the number to itself and return the value of the number.
  9. Return the number generated by Method2 to the foreach loop.
  10. The foreach loop calls GetEnumerator again.
  11. The end of for loop is reached, c1 is incremented, and the loop performs another iteration. Local iteration continues until c1 has reached the value of 10.
  12. Console.WriteLine generates text after.

The foreach loop will iterate 20 times, because for each GetEnumerator two foreach iterations are generated. The logic presented is fairly sophisticated because the generated state table has to be aware of a loop that contains two yield statements that include method calls.

The generated MSIL IEnumerator.MoveNext method is as follows:

private bool MoveNext() {
    switch (this.<>1__state)
    {
    case 0:
        this.<>1__state = -1;
        Console.WriteLine("before");
        this.<c1 > 5__1 = 0;
        while (this.<c1 > 5__1 < 10)
        {
            this.<>4__this._param = 10 + this.<c1 > 5__1;
            this.<>2__current = this.<>4__this.Method1(<>4__this._param);
            this.<>1__state = 1;
            return true;
            Label_0089:
            this.<>1__state = -1;
            this.<>2__current = this.<>4__this.Method2(<>4__this._param);
            this.<>1__state = 2;
            return true;
            Label_00BC:
            this.<>1__state = -1;
            this.<c1 > 5__1++;
        }
        Console.WriteLine("after");
        break;
 
    case 1:
        goto Label_0089;
 
    case 2:
        goto Label_00BC;
    }
    return false;
}

The bolded code cross-references the logic from the original GetEnumerator method implementation that used the yield statement. The generated code looks simple, but its behavior is fairly complex. For example, look in the while loop for the code this.<>1__state = 1. Right after that is a return true statement, and right after that is Label_0089. This code, which is rare, is the implementation of the yield statement that causes an exit and entry in the context of a loop.

The state table (switch( this.<>1__state)) has three labels: 0, 1, and 2. The state position 0 is called the first time when the loop is started. Like previously illustrated, the state position is assigned to –1 in case errors occur. After the repositioning, the method Console.WriteLine is called, and the data member this.<c1>5__1 is assigned to 0. The naming of the data member is not coincidental—it is the loop counter. But what is curious is that the loop counter (c1) that was originally declared as a method scope variable has been converted into a class data member.

In the original implementation the challenge was to exit and enter back into a loop using the yield statement. The solution in the generated code is to move method-level declarations to the class level. This means that the state is based at the level, and thus if a loop is exited and entered again, the loop will see the correct state. It is not common to store the loop counter as a data member, but in this case it helps overcome the exit and entry of the loop.

Continuing with the loop analysis, the for loop is converted into a while loop. The counter c1 is assigned in the line before the while loop. After the while loop line, the data member _param is assigned and the method Method1 is called. How can a generated class access the data members and methods of another class instance? The magic lies in the fact that the generated class is a private class, enabling access to all data members and methods of the parent class. To access the parent instance, the data member <>4__this is used.

Once the method Method1 has been called, the state position is changed to 1 and the while loop is exited, with a return value of true.

When the foreach loop has done its iteration, the MoveNext method is called again, and the code jumps back into the loop with the state that the loop had as the loop was exited. The loop is started again by using the state table value of 1, which is a goto (Goto statements and data members manage the state of the state table.) to the Label_0089 that is located in the middle of the while loop. That jump makes the method implementation behave as if nothing happened, so processing continues where the method last left off.

Remember the following about the yield statement:

  • In the C# programming language, the yield keyword enhances the language to simplify the implementation of certain pieces of code.
  • You do not have to use the yield keyword; the old way of implementing an iterable collection still applies. If you want to, you could create your own state engine and table that would mimic the behavior of yield 100 percent.
  • The yield statement creates a state table that remembers where the code was last executed.
  • The yield statement generates code that is like spaghetti code, and it leaves me wondering if it works in all instances. I have tried various scenarios using foreach and everything worked. I wonder if there are .NET tools that would act and behave like a foreach statement that could generate some esoteric language constructs and cause a failure. I cannot help but wonder if there are hidden bugs waiting to bite you in the butt at the wrong moment.
  • If I had one feature request, it would be to formalize the state engine into something that all .NET developers could take advantage of.


Previous_Page_.gif Next_Page_.gif

Personal tools