The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

  • My badges

  • Twitter Updates

  • My Flickr Stream

  • Pages

  • All categories

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 4,225 other subscribers

.NET/C# – Some notes in IGrouping (via: Grouping in LINQ is weird (IGrouping is your friend) – Mike Taulty’s Blog)

Posted by jpluimers on 2013/04/02

IGrouping interface diagram (click to enlarge)

IGrouping interface diagram (click to enlarge)

One of the things most people find hard to use in LINQ is GroupBy or the LINQ expression group … by (they are mostly equivalent).

When starting to use that, I was also confused, mainly because of two reasons:

  1. GroupBy returns a IGrouping<TKey, TElement> generic interface, but the classes that implement it are internal and not visble from outside the BCL (although you could artificially create your own).
    This interface extends the IEnumerable<TElement> in a full “is a” fashion adding a Key member.
    This has two consequences:

    1. Because it is a “is a” extension of the IEnumerable<TElement>, you can use foreach to enumerate the TElement members for the current group inside the grouping.
      No need to search for a Value that has the Elements, as the Group is the Elements.
    2. The Key member is indeed the current instance of what you are grouping over. Which means that Count<TElement>, are for the current group in the grouping.
  2. The LINQ expression syntax for grouping on multiple columns is not straightforward:
    1. Grouping on multiple columns uses a bit different syntax than you are used from SQL.
      (Another difference is that SQL returns a set, but groups are IEnumerable)
    2. You also need to be a bit careful to make sure the group keys are indeed distinct.

Most people don’t see the IGrouping<TKey, TElement> because they use the var keyword to implicitly the LINQ result.
Often – when using any anonymous type – var is the only way of using that result.
That is fine, but has the consequence that it hides the actual type, which – when not anonymous – is a good way of seeing what happens behind the scenes.

David Klein gave an example for the multi column grouping and also shows that if you use LINQPad, you can actually see the IGrouping<TKey, TElement> in action.

Mike Taulty skipped the Group By Syntax for Grouping on Multiple Columns in his Grouping in LINQ is weird (IGrouping is your friend). So my examples include that.

Note that I don’t cover all the LINQ group by stuff, here, for instance, I skipped the into part.
There are some nice examples on MSDN covering exactly that both using Method based and Expression based LINQ.

The examples are based on these two classes, similar to what Mike did.

underlying classes

A Fruit class:

namespace IGroupingConsoleApplication
{
    public class Fruit
    {
        public string Type { get; private set; }
        public string Variety { get; private set; }
        public int  Quantity { get; private set; }
        public decimal PricePerKilo { get; private set; }

        public Fruit(string type, string variety, int quantity, decimal pricePerKilo)
        {
            Type = type;
            Variety = variety;
            Quantity = quantity;
            PricePerKilo = pricePerKilo;
        }
    }
}

A FruitVariety class for use in the grouping (maybe I should call that a Cultivar class in stead as that is what it represents).

namespace IGroupingConsoleApplication
{
    public class FruitVariety
    {
        public string Type { get; private set; }
        public string Variety { get; private set; }

        public FruitVariety(string type, string variety)
        {
            Type = type;
            Variety = variety;
        }
    }
}

A Grocery class that can Produce some Dutch fruits.
Note the two lines with “Elstar” to make sure one of the groups contains 2 elements. Oh and they taste good too (:

using System.Collections.Generic;

namespace IGroupingConsoleApplication
{
    public class Grocerie
    {
        public static IEnumerable BuyFrom()
        {
            const string Apple = "Apple";
            const string Pear = "Pear";
            const string Cherry = "Cherry";

            Fruit[] result = new Fruit[]
            {
                new Fruit(Apple, "Belle de Boskoop", 12, 2.48m),
                new Fruit(Apple, "Elstar", 7, 1.98m),
                new Fruit(Apple, "Elstar", 9, 1.88m),
                new Fruit(Apple, "Red Prince", 11, 2.98m),
                new Fruit(Apple, "Santana", 5, 2.48m),
                new Fruit(Pear, "Gieser Wildeman", 3, 3.48m),
                new Fruit(Pear, "Verdi", 6, 3.98m),
                new Fruit(Cherry, "Early Rivers", 30, 5.98m),
                new Fruit(Cherry, "Morel", 25, 6.98m),
            };
            return result;
        }
    }
}

When you look at the code examples, you will see that I favour IEnumerable and IEnumerable<T> (which extends IEnumerable) as method return types and parameters over List<T> or [] arrays (there is no Array<T> in .NET).

LINQ extends the classes in the System.Collections.Generic namespace. Those classes expose IEnumerable<T>.
The Enumerable class in the System.Linq namespace provides these extensions and is fully centered around  IEnumerable<T>.

Charlie Calvert explained this very well in the LINQFarm: Understanding IEnumerable entry of his LINQLINQFarm series:

The type IEnumerable<T> plays two key roles in this code.

LINQ code

The rest of the code are the parts of the main program: different ways for formulating the LINQ:

  • Using explicit (named) types
    • with LINQ expressions
    • with LINQ methods
  • Using implicit (anonymous) types
    • with LINQ expressions
    • with LINQ methods

I’ll start with the explicit example, as that clearly shows the IGrouping<TKey, TElement>: that is the TKey and what is the TElement.

The LINQ expresions and LINQ methods are usually equivalent, and in these cases they are.

Explicitly typed LINQ code

So this LINQ code that groups by FruitVariety using explicit types: Fruit

        private static void showExplicitGrouping_LINQ_Expression(IEnumerable fruits)
        {
            IEnumerable<IGrouping<FruitVariety, Fruit>> groups =
                from fruit
                    in fruits
                group fruit by new FruitVariety(fruit.Type, fruit.Variety);

            showExplicitFruitVarietyGrouping(groups);
        }

is equivalent to:


        private static void showExplicitGrouping_LINQ_Method(IEnumerable fruits)
        {
            IEnumerable<IGrouping<FruitVariety, Fruit>> groups =
                fruits.GroupBy(fruit => new FruitVariety(fruit.Type, fruit.Variety));

            showExplicitFruitVarietyGrouping(groups);
        }

Both group by 2 Fruit columns: Type and Variety, as the FruitVariety class implements those.

Because of the explicit (named) types, you can actually pass the groups (of type IEnumerable<IGrouping<FruitVariety, Fruit>>) to a method for postprocessing:

        private static void showExplicitFruitVarietyGrouping(IEnumerable<IGrouping<FruitVariety, Fruit>> groups)
        {
            foreach (IGrouping<FruitVariety, Fruit> group in groups)
            {
                FruitVariety key = group.Key;
                Console.WriteLine("Type: {0}, Variety: {1}", key.Type, key.Variety);
                showFruits(group);
            }
        }

In the above method, you can see that the Key in the group is of type FruitVariety.
This is because the group fruit by and fruits.GroupBy are both of type FruitVariety.

IEnumerator interface diagram (click to enlarge)

IEnumerator interface diagram (click to enlarge)

Since IGrouping<FruitVariety, Fruit>) extends IEnumerable<Fruit>, you can pass each group to the IEnumerable<Fruit> fruits parameter of showFruits to display the detail records by another foreach loop that uses the IEnumerator<Fruit> that fruits.GetEnumerator() provides:

        private static void showFruits(IEnumerable fruits)
        {
            foreach (Fruit fruit in fruits)
            {
                Console.WriteLine("  Quantity: {0}, Price per Kilo: {1}", fruit.Quantity, fruit.PricePerKilo);
            }
        }

So: using explicit types has two advantages:

  • You can see the actual underlying types used
  • You can use explicit types through your whole code.

But sometimes you cannot use explicit types, mostly when you use anonymous types. So then var is the way to go.

Implicitly (var) typed LINQ code with anonymous types

Below is the LINQ code that is equivalent to the above, but then using var and anonymous types.
This code does not group by FruitVariety, but with new { fruit.Type, fruit.Variety; }.

You immediately see one of the drawbacks: you cannot have the foreach loop outside this method without doing some serious refactoring (anonymous types are for temporal usage for a reason) or (heaven forbid) cast by example.

private static void showImplicitGrouping_LINQ_Expression(IEnumerable fruits)
        {
            var groups = from fruit
                             in fruits
                         group fruit by new { fruit.Type, fruit.Variety };

            foreach (var group in groups)
            {
                var key = group.Key;
                Console.WriteLine("Type: {0}, Variety: {1}", key.Type, key.Variety);

                showFruits(group);
            }
        }

the above code is equivalent to:

        private static void showImplicitGrouping_LINQ_Method(IEnumerable fruits)
        {
            var groups = fruits.GroupBy(fruit => new { fruit.Type, fruit.Variety });

            foreach (var group in groups)
            {
                var key = group.Key;
                Console.WriteLine("Type: {0}, Variety: {1}", key.Type, key.Variety);

                showFruits(group);
            }
        }

You cannot put the forach loop in a separate method, as you cannot pass the var group as a var parameter in C# (in Oxygene, you could).

And even if you could, then the underlying type of the var type would be invisible in the method, so you would not be able to know that key in fact contains a Type and Variety field.

And parameterizing the method with generic types TKey and TElement this isn’t going to work either.

        private static void showImplicitFruitVarietyGrouping<TKey, TElement>(IEnumerable<IGrouping<TKey, TElement>> groups)
        {
            foreach (IGrouping<TKey, TElement> group in groups)
            {
                TKey key = group.Key; // doesn't compile
                Console.WriteLine("Type: {0}, Variety: {1}", key.Type, key.Variety); // doesn't compile
                showFruits(group);
            }
        }

Anyway: the foreach stays, but you can call showFruits as the var doesn’t mean it is untyped, it means it is implicitly typed, so it has a type, and from the group type, the type of key, can be deducted, as well that you can pass group as an IEnumerable parameter to showFruits.

Output

Both pieces of code will output like below.
Which means you have two entries for Elstar, and a single entry for all other cultivars.

Type: Apple, Variety: Belle de Boskoop
  Quantity: 12, Price per Kilo: 2.48
Type: Apple, Variety: Elstar
  Quantity: 7, Price per Kilo: 1.98
  Quantity: 9, Price per Kilo: 1.88
Type: Apple, Variety: Red Prince
  Quantity: 11, Price per Kilo: 2.98
Type: Apple, Variety: Santana
  Quantity: 5, Price per Kilo: 2.48
Type: Pear, Variety: Gieser Wildeman
  Quantity: 3, Price per Kilo: 3.48
Type: Pear, Variety: Verdi
  Quantity: 6, Price per Kilo: 3.98
Type: Cherry, Variety: Early Rivers
  Quantity: 30, Price per Kilo: 5.98
Type: Cherry, Variety: Morel
  Quantity: 25, Price per Kilo: 6.98

In a future blog post, I will show you some other aspects of grouping where you expect the same output as above, but the actual grouping is slightly different.

–jeroen

via:

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

 
%d bloggers like this: