Efficient Data Processing: Leveraging C#'s foreach Loop

Posted Aug 18, 2023 Updated Mar 10, 2024

By Antão Almada 14 min read

In the dynamic landscape of modern software development, efficient data processing lies at the core of creating high-performance applications. As developers, we constantly strive to balance readability, maintainability, and speed when writing code. In the realm of C#, a programming language known for its versatility and robustness, the foreach loop emerges as a powerful tool for seamlessly navigating collections.

foreach` is a statement in C# that generates the code required to traverse the items of a collection. The syntax is very simple:

  
foreach(var item in source)
{
    Console.WriteLine(item);
}

This article will focus on the code that is generated by the compiler given various types of collections so that you can understand exactly what your application is doing when you use this type of loop.

Minimum requirements

For the foreach statement to accept a collection as its source, the collection must provide a public parameterless method named GetEnumerator() that returns a new instance of a enumerator object. The enumerator must then provide a public parameterless method named MoveNext() that returns a boolean and also a public property named Current.

The enumerator must maintain the state of the enumeration. The method MoveNext() should return true if a next item is available; otherwise false. The property Current should return the item at the current position.

As an example, here’s a collection that stores its items in an inner array and that implements the minimum required to be traversed using a foreach statement:

  
class MyCollection
{
    readonly int[] source;

    public MyCollection(int[] source)
        => this.source = source;

    public Enumerator GetEnumerator()
        => new Enumerator(this);

    public struct Enumerator
    {
        readonly int[] source;
        int index;

        public Enumerator(MyCollection enumerable)
        {
            source = enumerable.source;
            index = -1;
        }

        public int Current
            => source[index];

        public bool MoveNext()
            => ++index < source.Length;
    }
}

The enumerator should always be a struct (a value-type) for improved performance! Check my other article “Performance of value-type vs reference-type enumerators in C#” to understand why.

The compiler converts the C# code into Intermediate Language (IL). There is no foreach instruction in IL. Using SharpLab, you can see that in this case the foreach is converted to something equivalent to the following:

  
MyCollection.Enumerator enumerator = new MyCollection(array).GetEnumerator();
while (enumerator.MoveNext())
{
    Console.WriteLine(enumerator.Current);
}

It first calls GetEnumerator() to get a new instance of the enumerator. It then uses MoveNext() as the continuation condition in a while loop and, for every time it returns true, gets the item using Current.

If the enumerator must release any resources at the end of the item traversal, it must implement IDisposable. You can see in SharpLab that in this case the generated code will make sure Dispose() is called:

  
MyCollection.Enumerator enumerator = new MyCollection(array).GetEnumerator();
try
{
    while (enumerator.MoveNext())
    {
        Console.WriteLine(enumerator.Current);
    }
}
finally
{
    ((IDisposable)enumerator).Dispose();
}

Returning items by reference

The foreach statement supports returning the items by reference. Passing items by reference improves the performance when traversing collection containing large structs as it will not copy each of the items. It will simply pass a reference to each of them.

For example, we can change the Current property to return ref int:

  
class MyCollection
{
    readonly int[] source;

    public MyCollection(int[] source)
        => this.source = source;

    public Enumerator GetEnumerator()
        => new Enumerator(this);

    public struct Enumerator
    {
        readonly int[] source;
        int index;

        public Enumerator(MyCollection enumerable)
        {
            source = enumerable.source;
            index = -1;
        }

        public ref int Current // return by reference
            => ref source[index];

        public bool MoveNext()
            => ++index < source.Length;

        public void Dispose() {}
    }
}

In this case, Current returns a reference to the position where the item is stored. This way, you can both initialize the values for a MyCollection instance using a foreach and also list its items to the console:

  
var source = new MyCollection(new int[10]);

// initialize all to ones
foreach(ref var item in source)
    item = 1;

// output to console
foreach(ref readonly var item in source)
    Console.WriteLine(item);

Notice the ref keyword in the first foreach so that the item value can be changed. The second foreach uses the ref readonly keywords so that a new value cannot be assigned to the item. You can see it working in SharpLab.

The Current property can be changed to return ref readonly making it impossible to assign value to the items.

If you’d like the enumerator to contain a span field, it’s possible to declare the enumerator as a ref struct. The example collection can be implemented as follow:

  
class MyCollection
{
readonly int[] source;

    public MyCollection(int[] source)
        => this.source = source;

    public Enumerator GetEnumerator()
        => new Enumerator(this);

    public ref struct Enumerator // ref struct enumerator
    {
        readonly ReadOnlySpan<int> source; // readonly span field
        int index;

        public Enumerator(MyCollection enumerable)
        {
            source = enumerable.source;
            index = -1;
        }

        public ref readonly int Current // return by reference
            => ref source[index];

        public bool MoveNext()
            => ++index < source.Length;

        public void Dispose() {}
    }

}

You can see it working in SharpLab.

A ref struct cannot implement interfaces. In this case, foreach will call the Dispose() method if present. Without requiring the enumerator to implement IDisposable.

GetEnumerator() extension method

Starting from C# 6, foreach also supports the use of GetEnumerator() as an extension method. Imagine that you’d like to use foreach on the following collection developed by a third-party:

  
public class MyCollection
{
    readonly int[] source;

    public MyCollection(int[] source)
        => this.source = source;

    public int Count
        => source.Length;

    public int this[int index]
        => source[index];
}

This collection provides an indexer and a Count property that returns the number of items. If you try to use foreach on it, the compilation will fail.

You can then define the following extension method for the collection:

  
public static class MyExtension
{
    public static Enumerator GetEnumerator(this MyCollection source)
        => new Enumerator(source);

    public struct Enumerator
    {
        readonly MyCollection source;
        int index;

        public Enumerator(MyCollection source)
        {
            this.source = source;
            index = -1;
        }

        public int Current
            => source[index];

        public bool MoveNext()
            => ++index < source.Count;
    }
}

You can see in SharpLab that the foreach statement compiles. You can also see that the generated code is very similar. The only difference is that it uses the extension method:

  
MyExtensions.Enumerator enumerator = MyExtensions.GetEnumerator(new MyCollection(array))
while (enumerator.MoveNext())
{
    Console.WriteLine(enumerator.Current);
};

You can see in SharpLab that the same applies to GetAsyncEnumerator().

IEnumerable

IEnumerable is an interface defined in the namespace System.Collections that actually enforces the pattern required by the foreach statement. So, any type that implements IEnumerable can be traversed using the foreach statement.

As an example, here’s the sample collection now implementing IEnumerable:

  
class MyCollection : IEnumerable
{
    readonly int[] source;

    public MyCollection(int[] source)
        => this.source = source;

    public IEnumerator GetEnumerator()
        => new Enumerator(this);

    public struct Enumerator : IEnumerator
    {
        readonly int[] source;
        int index;

        public Enumerator(MyCollection enumerable)
        {
            source = enumerable.source;
            index = -1;
        }

        // public property
        public int Current
            => source[index];

        // explicit IEnumerator implementation
        object IEnumerator.Current
            => Current;

        public bool MoveNext()
            => ++index < source.Length;

        public void Reset()
            => index = -1;
    }
}

The only differences are:

The collection derives from IEnumerable.
GetEnumerator() must return IEnumerator.
The enumerator derives from IEnumerator.
The enumerator must have a Reset() method.

Notice that IEnumerator requires the property Current to return the type object. I want it to return int as it’s the type of the item for this collection. Its possible to have both implementations, one public and the other use explicit interface implementation. The explicit implementation property is only used when the enumerator is cast to IEnumerator.

If the enumerator does not support resetting, it should throw a NotSupportedException.

You can see in SharpLab that the generated code for a foreach statement is the following:

  
IEnumerator enumerator = new MyCollection(array).GetEnumerator();
try
{
    while (enumerator.MoveNext())
    {
        Console.WriteLine((int)enumerator.Current);
    }
}
finally
{
    IDisposable disposable = enumerator as IDisposable;
    if (disposable != null)
    {
        disposable.Dispose();
    }
}

Several things to notice:

The enumerator is returned as type IEnumerator, which is a reference-type.
The value returned by Current has to be cast to int because it’s using the explicit implementation.
Although the enumerator doesn’t implement IDispose, it adds code to check at runtime if it does.

I mentioned above that enumerators should have a value-type enumerator for better performance. We see here that by returning IEnumerator, the enumerator is boxed, which converts it into a reference-type. The way to workaround this is to also use explicit interface implementation for the method GetEnumerator():

  
class MyCollection : IEnumerable
{
    readonly int[] source;

    public MyCollection(int[] source)
        => this.source = source;

    // public method
    public Enumerator GetEnumerator()
        => new Enumerator(this);

    // explicit IEnumerable implementation
    IEnumerator IEnumerable.GetEnumerator()
        => GetEnumerator();

    public struct Enumerator : IEnumerator
    {
        readonly int[] source;
        int index;

        public Enumerator(MyCollection enumerable)
        {
            source = enumerable.source;
            index = -1;
        }

        public int Current
            => source[index];

        object IEnumerator.Current
            => Current;

        public bool MoveNext()
            => ++index < source.Length;

        public void Reset()
            => index = -1;
    }
}

You can see in SharpLab that now the generated code for a foreach statement is the following:

  
MyCollection.Enumerator enumerator = new MyCollection(array).GetEnumerator()
while (enumerator.MoveNext())
{
    Console.WriteLine(enumerator.Current);
};

It uses the value type enumerator. It will only use the reference-type enumerator if the collection is cast to IEnumerable.

All collections provided by .NET provide a value-type enumerator. You should do the same if you implement your own collection.

IEnumerable

IEnumerable<T> and IEnumerator<T> extend the pair of interfaces IEnumerable and IEnumerator to specify the type of item returned by the Current property.

Because IEnumerable<T> derives from IEnumerable, and IEnumerator<T> derives from IEnumerator and IDispose, the example collection should be implemented as follow:

  
class MyCollection : IEnumerable<int>
{
    readonly int[] source;

    public MyCollection(int[] source)
        => this.source = source;

    // public method
    public Enumerator GetEnumerator()
        => new Enumerator(this);

    // explicit IEnumerable<T> implementation
    IEnumerator<int> IEnumerable<int>.GetEnumerator()
        => GetEnumerator();

    // explicit IEnumerable implementation
    IEnumerator IEnumerable.GetEnumerator()
        => GetEnumerator();

    public struct Enumerator : IEnumerator<int>
    {
        readonly int[] source;
        int index;

        public Enumerator(MyCollection enumerable)
        {
            source = enumerable.source;
            index = -1;
        }

        public int Current
            => source[index];

        object IEnumerator.Current
            => Current;

        public bool MoveNext()
            => ++index < source.Length;

        public void Reset()
            => index = -1;

        public void Dispose() {}
    }
}

You can see in SharpLab that now the generated code for a foreach statement is the following:

  
MyCollection.Enumerator enumerator = new MyCollection(array).GetEnumerator()
try
{
    while (enumerator.MoveNext())
    {
        Console.WriteLine(enumerator.Current);
    }
}
finally
{
    ((IDisposable)enumerator).Dispose();
};

The differences are:

The enumerator is a value-type.
The value returned by Current doesn’t require a cast. This improves performance as it was being done for each item.
Dispose() is called even though it’s empty. It’s IEnumerator<T> that makes the Dispose() mandatory.

The Dispose() call can be avoided by declaring two enumerators for the collection:

  
class MyCollection : IEnumerable<int>
{
    readonly int[] source;

    public MyCollection(int[] source)
        => this.source = source;

    public Enumerator GetEnumerator()
        => new Enumerator(this);

    IEnumerator<int> IEnumerable<int>.GetEnumerator()
        => new ReferenceEnumerator(this);

    IEnumerator IEnumerable.GetEnumerator()
        => new ReferenceEnumerator(this);

    // value type enumerator
    public struct Enumerator
    {
        readonly int[] source;
        int index;

        public Enumerator(MyCollection enumerable)
        {
            source = enumerable.source;
            index = -1;
        }

        public int Current
            => source[index];

        public bool MoveNext()
            => ++index < source.Length;
    }

    // reference type enumerator
    class ReferenceEnumerator : IEnumerator<int>
    {
        readonly int[] source;
        int index;

        public ReferenceEnumerator(MyCollection enumerable)
        {
            source = enumerable.source;
            index = -1;
        }

        public int Current
            => source[index];

        object IEnumerator.Current
            => Current;

        public bool MoveNext()
            => ++index < source.Length;

        public void Reset()
            => index = -1;

        public void Dispose() {}
    }
}

Thing that changed:

The public GetEnumerator() returns an instance of the value-type enumerator while the other ones return instances of the reference-type enumerator.
The value-type enumerator only implements the minimum requirements.
The reference-type enumerator is declared as private as it’s only used internally.
The reference-type enumerator is declared as a class. This avoids the boxing performance penalty of converting from value-type to reference-type.

You can see in SharpLab that the generated code for the foreach statement if the following:

  
MyCollection.Enumerator enumerator = new MyCollection(array).GetEnumerator()
while (enumerator.MoveNext())
{
    Console.WriteLine(enumerator.Current);
};

The advantage is that without the try/finally blocks the JIT compiler may be able to perform more optimizations resulting in better performance.

Arrays and Span

Arrays and Span<T> are types of collections where the data is stored in a contiguous portion of memory. These are exceptions on how foreach deals with them. Instead of using an enumerator, it uses the indexer which performs much better.

You can see in SharpLab that the generated code for a foreach statement with an array as source is the following:

  
int[] array2 = array
int num = 0;
while (num < array2.Length)
{
    Console.WriteLine(array2[num]);
    num++;
};

Please check my other article “Array iteration performance in C#” where I analyze into more detail this case.

The only issue with using foreach on arrays is that it only allows full traversal. If you want to traverse only a portion of the array, you can create an instance of ArraySegment<T> or Span<T> and use foreach to traverse it.

Both Span<T> and ReadOnlySpan<T> support passing items by reference. Don’t forget to use the ref keyword when traversing these types using foreach.

I’ve implemented a Roslyn Analyzer that includes a rule that warns you when ref and ref readonly can be used. Install it to get this and several many other rules related to the used of foreach.

Conclusions

foreach has a very simple and clear syntax. We saw here that the C# compiler adapts the generated code to the type of collection.

When declaring a new collection type, you should adjust its enumeration code so that foreach can take full advantage of it.

development

This post is licensed under CC BY 4.0 by the author.

Minimum requirements

Returning items by reference

GetEnumerator() extension method

IEnumerable

IEnumerable

Arrays and Span

Conclusions

Trending Tags