Efficient Data Processing: Leveraging C#'s foreach Loop
In the dynamic landscape of modern software development, efficient data processing lies at the core of creating high-performance applications. As developers, we constantly strive to balance readability, maintainability, and speed when writing code. In the realm of C#, a programming language known for its versatility and robustness, the foreach
loop emerges as a powerful tool for seamlessly navigating collections.
foreach
` is a statement in C# that generates the code required to traverse the items of a collection. The syntax is very simple:
1
2
3
4
foreach(var item in source)
{
Console.WriteLine(item);
}
This article will focus on the code that is generated by the compiler given various types of collections so that you can understand exactly what your application is doing when you use this type of loop.
Minimum requirements
For the foreach
statement to accept a collection as its source, the collection must provide a public parameterless method named GetEnumerator()
that returns a new instance of a enumerator object. The enumerator must then provide a public parameterless method named MoveNext()
that returns a boolean and also a public property named Current
.
The enumerator must maintain the state of the enumeration. The method MoveNext()
should return true
if a next item is available; otherwise false
. The property Current
should return the item at the current position.
As an example, here’s a collection that stores its items in an inner array and that implements the minimum required to be traversed using a foreach
statement:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
class MyCollection
{
readonly int[] source;
public MyCollection(int[] source)
=> this.source = source;
public Enumerator GetEnumerator()
=> new Enumerator(this);
public struct Enumerator
{
readonly int[] source;
int index;
public Enumerator(MyCollection enumerable)
{
source = enumerable.source;
index = -1;
}
public int Current
=> source[index];
public bool MoveNext()
=> ++index < source.Length;
}
}
The enumerator should always be a struct (a value-type) for improved performance! Check my other article “Performance of value-type vs reference-type enumerators in C#” to understand why.
The compiler converts the C# code into Intermediate Language (IL). There is no foreach
instruction in IL. Using SharpLab, you can see that in this case the foreach
is converted to something equivalent to the following:
1
2
3
4
5
MyCollection.Enumerator enumerator = new MyCollection(array).GetEnumerator();
while (enumerator.MoveNext())
{
Console.WriteLine(enumerator.Current);
}
It first calls GetEnumerator()
to get a new instance of the enumerator. It then uses MoveNext()
as the continuation condition in a while
loop and, for every time it returns true
, gets the item using Current
.
If the enumerator must release any resources at the end of the item traversal, it must implement IDisposable
. You can see in SharpLab that in this case the generated code will make sure Dispose()
is called:
1
2
3
4
5
6
7
8
9
10
11
12
MyCollection.Enumerator enumerator = new MyCollection(array).GetEnumerator();
try
{
while (enumerator.MoveNext())
{
Console.WriteLine(enumerator.Current);
}
}
finally
{
((IDisposable)enumerator).Dispose();
}
Returning items by reference
The foreach
statement supports returning the items by reference. Passing items by reference improves the performance when traversing collection containing large structs as it will not copy each of the items. It will simply pass a reference to each of them.
For example, we can change the Current
property to return ref int
:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
class MyCollection
{
readonly int[] source;
public MyCollection(int[] source)
=> this.source = source;
public Enumerator GetEnumerator()
=> new Enumerator(this);
public struct Enumerator
{
readonly int[] source;
int index;
public Enumerator(MyCollection enumerable)
{
source = enumerable.source;
index = -1;
}
public ref int Current // return by reference
=> ref source[index];
public bool MoveNext()
=> ++index < source.Length;
public void Dispose() {}
}
}
In this case, Current
returns a reference to the position where the item is stored. This way, you can both initialize the values for a MyCollection
instance using a foreach
and also list its items to the console:
1
2
3
4
5
6
7
8
9
var source = new MyCollection(new int[10]);
// initialize all to ones
foreach(ref var item in source)
item = 1;
// output to console
foreach(ref readonly var item in source)
Console.WriteLine(item);
Notice the ref
keyword in the first foreach
so that the item value can be changed. The second foreach
uses the ref readonly
keywords so that a new value cannot be assigned to the item. You can see it working in SharpLab.
The Current
property can be changed to return ref readonly
making it impossible to assign value to the items.
If you’d like the enumerator to contain a span field, it’s possible to declare the enumerator as a ref struct
. The example collection can be implemented as follow:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
class MyCollection
{
readonly int[] source;
public MyCollection(int[] source)
=> this.source = source;
public Enumerator GetEnumerator()
=> new Enumerator(this);
public ref struct Enumerator // ref struct enumerator
{
readonly ReadOnlySpan<int> source; // readonly span field
int index;
public Enumerator(MyCollection enumerable)
{
source = enumerable.source;
index = -1;
}
public ref readonly int Current // return by reference
=> ref source[index];
public bool MoveNext()
=> ++index < source.Length;
public void Dispose() {}
}
}
You can see it working in SharpLab.
A ref struct cannot implement interfaces. In this case,
foreach
will call the Dispose() method if present. Without requiring the enumerator to implement IDisposable.
GetEnumerator() extension method
Starting from C# 6, foreach
also supports the use of GetEnumerator()
as an extension method. Imagine that you’d like to use foreach
on the following collection developed by a third-party:
1
2
3
4
5
6
7
8
9
10
11
12
13
public class MyCollection
{
readonly int[] source;
public MyCollection(int[] source)
=> this.source = source;
public int Count
=> source.Length;
public int this[int index]
=> source[index];
}
This collection provides an indexer and a Count
property that returns the number of items. If you try to use foreach
on it, the compilation will fail.
You can then define the following extension method for the collection:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
public static class MyExtension
{
public static Enumerator GetEnumerator(this MyCollection source)
=> new Enumerator(source);
public struct Enumerator
{
readonly MyCollection source;
int index;
public Enumerator(MyCollection source)
{
this.source = source;
index = -1;
}
public int Current
=> source[index];
public bool MoveNext()
=> ++index < source.Count;
}
}
You can see in SharpLab that the foreach
statement compiles. You can also see that the generated code is very similar. The only difference is that it uses the extension method:
1
2
3
4
5
MyExtensions.Enumerator enumerator = MyExtensions.GetEnumerator(new MyCollection(array))
while (enumerator.MoveNext())
{
Console.WriteLine(enumerator.Current);
};
You can see in SharpLab that the same applies to GetAsyncEnumerator()
.
IEnumerable
IEnumerable
is an interface defined in the namespace System.Collections
that actually enforces the pattern required by the foreach
statement. So, any type that implements IEnumerable
can be traversed using the foreach
statement.
As an example, here’s the sample collection now implementing IEnumerable
:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
class MyCollection : IEnumerable
{
readonly int[] source;
public MyCollection(int[] source)
=> this.source = source;
public IEnumerator GetEnumerator()
=> new Enumerator(this);
public struct Enumerator : IEnumerator
{
readonly int[] source;
int index;
public Enumerator(MyCollection enumerable)
{
source = enumerable.source;
index = -1;
}
// public property
public int Current
=> source[index];
// explicit IEnumerator implementation
object IEnumerator.Current
=> Current;
public bool MoveNext()
=> ++index < source.Length;
public void Reset()
=> index = -1;
}
}
The only differences are:
- The collection derives from
IEnumerable
. GetEnumerator()
must returnIEnumerator
.- The enumerator derives from
IEnumerator
. - The enumerator must have a
Reset()
method.
Notice that IEnumerator
requires the property Current
to return the type object
. I want it to return int
as it’s the type of the item for this collection. Its possible to have both implementations, one public and the other use explicit interface implementation. The explicit implementation property is only used when the enumerator is cast to IEnumerator
.
If the enumerator does not support resetting, it should throw a
NotSupportedException
.
You can see in SharpLab that the generated code for a foreach
statement is the following:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
IEnumerator enumerator = new MyCollection(array).GetEnumerator();
try
{
while (enumerator.MoveNext())
{
Console.WriteLine((int)enumerator.Current);
}
}
finally
{
IDisposable disposable = enumerator as IDisposable;
if (disposable != null)
{
disposable.Dispose();
}
}
Several things to notice:
- The enumerator is returned as type
IEnumerator
, which is a reference-type. - The value returned by
Current
has to be cast to int because it’s using the explicit implementation. - Although the enumerator doesn’t implement
IDispose
, it adds code to check at runtime if it does.
I mentioned above that enumerators should have a value-type enumerator for better performance. We see here that by returning IEnumerator
, the enumerator is boxed, which converts it into a reference-type. The way to workaround this is to also use explicit interface implementation for the method GetEnumerator()
:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
class MyCollection : IEnumerable
{
readonly int[] source;
public MyCollection(int[] source)
=> this.source = source;
// public method
public Enumerator GetEnumerator()
=> new Enumerator(this);
// explicit IEnumerable implementation
IEnumerator IEnumerable.GetEnumerator()
=> GetEnumerator();
public struct Enumerator : IEnumerator
{
readonly int[] source;
int index;
public Enumerator(MyCollection enumerable)
{
source = enumerable.source;
index = -1;
}
public int Current
=> source[index];
object IEnumerator.Current
=> Current;
public bool MoveNext()
=> ++index < source.Length;
public void Reset()
=> index = -1;
}
}
You can see in SharpLab that now the generated code for a foreach
statement is the following:
1
2
3
4
5
MyCollection.Enumerator enumerator = new MyCollection(array).GetEnumerator()
while (enumerator.MoveNext())
{
Console.WriteLine(enumerator.Current);
};
It uses the value type enumerator. It will only use the reference-type enumerator if the collection is cast to IEnumerable
.
All collections provided by .NET provide a value-type enumerator. You should do the same if you implement your own collection.
IEnumerable
IEnumerable<T>
and IEnumerator<T>
extend the pair of interfaces IEnumerable
and IEnumerator
to specify the type of item returned by the Current
property.
Because IEnumerable<T>
derives from IEnumerable
, and IEnumerator<T>
derives from IEnumerator
and IDispose
, the example collection should be implemented as follow:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
class MyCollection : IEnumerable<int>
{
readonly int[] source;
public MyCollection(int[] source)
=> this.source = source;
// public method
public Enumerator GetEnumerator()
=> new Enumerator(this);
// explicit IEnumerable<T> implementation
IEnumerator<int> IEnumerable<int>.GetEnumerator()
=> GetEnumerator();
// explicit IEnumerable implementation
IEnumerator IEnumerable.GetEnumerator()
=> GetEnumerator();
public struct Enumerator : IEnumerator<int>
{
readonly int[] source;
int index;
public Enumerator(MyCollection enumerable)
{
source = enumerable.source;
index = -1;
}
public int Current
=> source[index];
object IEnumerator.Current
=> Current;
public bool MoveNext()
=> ++index < source.Length;
public void Reset()
=> index = -1;
public void Dispose() {}
}
}
You can see in SharpLab that now the generated code for a foreach
statement is the following:
1
2
3
4
5
6
7
8
9
10
11
12
MyCollection.Enumerator enumerator = new MyCollection(array).GetEnumerator()
try
{
while (enumerator.MoveNext())
{
Console.WriteLine(enumerator.Current);
}
}
finally
{
((IDisposable)enumerator).Dispose();
};
The differences are:
- The enumerator is a value-type.
- The value returned by
Current
doesn’t require a cast. This improves performance as it was being done for each item. Dispose()
is called even though it’s empty. It’sIEnumerator<T>
that makes theDispose()
mandatory.
The Dispose()
call can be avoided by declaring two enumerators for the collection:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
class MyCollection : IEnumerable<int>
{
readonly int[] source;
public MyCollection(int[] source)
=> this.source = source;
public Enumerator GetEnumerator()
=> new Enumerator(this);
IEnumerator<int> IEnumerable<int>.GetEnumerator()
=> new ReferenceEnumerator(this);
IEnumerator IEnumerable.GetEnumerator()
=> new ReferenceEnumerator(this);
// value type enumerator
public struct Enumerator
{
readonly int[] source;
int index;
public Enumerator(MyCollection enumerable)
{
source = enumerable.source;
index = -1;
}
public int Current
=> source[index];
public bool MoveNext()
=> ++index < source.Length;
}
// reference type enumerator
class ReferenceEnumerator : IEnumerator<int>
{
readonly int[] source;
int index;
public ReferenceEnumerator(MyCollection enumerable)
{
source = enumerable.source;
index = -1;
}
public int Current
=> source[index];
object IEnumerator.Current
=> Current;
public bool MoveNext()
=> ++index < source.Length;
public void Reset()
=> index = -1;
public void Dispose() {}
}
}
Thing that changed:
- The public
GetEnumerator()
returns an instance of the value-type enumerator while the other ones return instances of the reference-type enumerator. - The value-type enumerator only implements the minimum requirements.
- The reference-type enumerator is declared as private as it’s only used internally.
- The reference-type enumerator is declared as a class. This avoids the boxing performance penalty of converting from value-type to reference-type.
You can see in SharpLab that the generated code for the foreach
statement if the following:
1
2
3
4
5
MyCollection.Enumerator enumerator = new MyCollection(array).GetEnumerator()
while (enumerator.MoveNext())
{
Console.WriteLine(enumerator.Current);
};
The advantage is that without the try
/finally
blocks the JIT compiler may be able to perform more optimizations resulting in better performance.
Arrays and Span
Arrays and Span<T>
are types of collections where the data is stored in a contiguous portion of memory. These are exceptions on how foreach
deals with them. Instead of using an enumerator, it uses the indexer which performs much better.
1
2
3
4
5
6
7
int[] array2 = array
int num = 0;
while (num < array2.Length)
{
Console.WriteLine(array2[num]);
num++;
};
Please check my other article “Array iteration performance in C#” where I analyze into more detail this case.
The only issue with using foreach
on arrays is that it only allows full traversal. If you want to traverse only a portion of the array, you can create an instance of ArraySegment<T>
or Span<T>
and use foreach
to traverse it.
Both Span<T>
and ReadOnlySpan<T>
support passing items by reference. Don’t forget to use the ref
keyword when traversing these types using foreach.
I’ve implemented a Roslyn Analyzer that includes a rule that warns you when ref and ref readonly can be used. Install it to get this and several many other rules related to the used of foreach.
Conclusions
foreach
has a very simple and clear syntax. We saw here that the C# compiler adapts the generated code to the type of collection.
When declaring a new collection type, you should adjust its enumeration code so that foreach
can take full advantage of it.