Post

Enumeration in .NET - Count()

Count()

On my previous article, I analysed the usage of IEnumerable strictly based on its contract. It was brought to my attention that LINQ, short for “LINQ to Objects”, optimizes some of the described scenarios, possibly breaking some of my assumptions. It’s a good point and I did some more research.

Checking the implementation of the Count() extension method in LINQ, we can see that it handles differently the case where the IEnumerable instance also implements ICollection or one other internal interface. This optimization is tricky and let me explain why.

First, this optimization is based on knowledge of the existence of other interfaces. It works with the current framework as all its collections implement ICollection, but who knows what interfaces and collections will be added in the future? IReadOnlyCollection was added to the framework after LINQ and it looks like they didn’t update this code. What about third-party libraries?

Second, the use of LINQ operators breaks the optimization. Let’s see…

The following code benchmarks the Count() extension method applied to an IEnumerable, a filtered IEnumerable, an ICollection and a filtered ICollection, for 0, 10 and 100 elements:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
using BenchmarkDotNet.Attributes;
using System.Collections.Generic;
using System.Linq;

namespace EnumerationBenchmarks
{
    [MemoryDiagnoser]
    public class CountBenchmarks
    {
        IEnumerable<int> enumerable;
        IEnumerable<int> filteredEnumerable;
        IEnumerable<int> collection;
        IEnumerable<int> filteredCollection;

        [Params(0, 10, 100)]
        public int ItemsCount { get; set; }

        [GlobalSetup]
        public void Setup()
        {
            enumerable = MyRange(0, ItemsCount);

            filteredEnumerable = enumerable
                .Where(_ => true);

            collection = enumerable
                .ToList();

            filteredCollection = collection
                .Where(_ => true);
        }

        [Benchmark]
        public int Enumerable_Count() =>
            enumerable.Count();

        [Benchmark]
        public int FilteredEnumerable_Count() =>
            filteredEnumerable.Count();

        [Benchmark]
        public int Collection_Count() =>
            collection.Count();

        [Benchmark]
        public int FilteredCollection_Count() =>
            filteredCollection.Count();

        static IEnumerable<int> MyRange(int start, int count)
        {
            var end = start + count;
            for (var value = start; value < end; value++)
                yield return value;
        }
    }
}

NOTE: I initially used Enumerable.Range() for the IEnumerable instance but the results where not what I expected. It turns out it implements the internal interface that is optimized in Count(). The method MyRange(), implemented at the bottom, returns a pure IEnumerable instance.

The results of the benchmark were the following:

1
2
3
4
5
BenchmarkDotNet v0.13.12, macOS Sonoma 14.3 (23D56) [Darwin 23.3.0]
Apple M1, 1 CPU, 8 logical and 8 physical cores
.NET SDK 8.0.100
  [Host]     : .NET 8.0.0 (8.0.23.53103), Arm64 RyuJIT AdvSIMD
  DefaultJob : .NET 8.0.0 (8.0.23.53103), Arm64 RyuJIT AdvSIMD
MethodItemsCountMeanErrorStdDevGen0Allocated
Enumerable_Count022.629 ns0.0165 ns0.0146 ns0.008956 B
FilteredEnumerable_Count021.174 ns0.0976 ns0.0815 ns0.008956 B
Collection_Count02.920 ns0.0025 ns0.0023 ns--
FilteredCollection_Count09.021 ns0.0142 ns0.0118 ns--
Enumerable_Count1054.165 ns0.1742 ns0.1629 ns0.008956 B
FilteredEnumerable_Count1056.801 ns0.1018 ns0.0850 ns0.008956 B
Collection_Count102.934 ns0.0359 ns0.0336 ns--
FilteredCollection_Count1026.009 ns0.0593 ns0.0526 ns--
Enumerable_Count100393.815 ns1.0288 ns0.9120 ns0.008656 B
FilteredEnumerable_Count100396.215 ns1.0694 ns1.0003 ns0.008656 B
Collection_Count1002.961 ns0.0020 ns0.0016 ns--
FilteredCollection_Count100180.376 ns0.3111 ns0.2429 ns--

As expected, Enumerable_Count allocates on the heap and its mean time increases directly with the number of elements, which is all bad.

Also expected from the Count() implementation, Collection_Count is much faster than Enumerable_Count, doesn’t allocate on the heap and has constant mean time whatever number of elements, which is all great.

What it’s important here is that, although FilteredCollection_Count was created as a collection, its mean time increases directly with the number of elements. It’s a smoother increase than with FilteredEnumerable_Count but still a complexity of O(n).

Conclusions

When you have a method with an IEnumerable argument, it’s assumed you have no idea how it was created. You should not expect Count() to have any complexity other than O(n).

If you can optimize the algorithm given more capabilities, make it explicit by using other interfaces like IReadOnlyCollection, IReadOnlyList, IReadOnlyDictionary or any other that makes sense.

The optimization seems to be misleading.

If you follow the rules from my previous article, you won’t have any surprises…

This post is licensed under CC BY 4.0 by the author.