.NET/C# – Some notes in IGrouping (via: Grouping in LINQ is weird (IGrouping is your friend) – Mike Taulty’s Blog)
Posted by jpluimers on 2013/04/02
One of the things most people find hard to use in LINQ is GroupBy or the LINQ expression group … by (they are mostly equivalent).
When starting to use that, I was also confused, mainly because of two reasons:
- GroupBy returns a IGrouping<TKey, TElement> generic interface, but the classes that implement it are internal and not visble from outside the BCL (although you could artificially create your own).
This interface extends the IEnumerable<TElement> in a full “is a” fashion adding a Key member.
This has two consequences:- Because it is a “is a” extension of the IEnumerable<TElement>, you can use foreach to enumerate the
TElement
members for the current group inside the grouping.
No need to search for a Value that has the Elements, as the Group is the Elements. - The Key member is indeed the current instance of what you are grouping over. Which means that Count<TElement>, are for the current group in the grouping.
- Because it is a “is a” extension of the IEnumerable<TElement>, you can use foreach to enumerate the
- The LINQ expression syntax for grouping on multiple columns is not straightforward:
- Grouping on multiple columns uses a bit different syntax than you are used from SQL.
(Another difference is that SQL returns a set, but groups are IEnumerable) - You also need to be a bit careful to make sure the group keys are indeed distinct.
- Grouping on multiple columns uses a bit different syntax than you are used from SQL.
Most people don’t see the IGrouping<TKey, TElement> because they use the var keyword to implicitly the LINQ result.
Often – when using any anonymous type – var is the only way of using that result.
That is fine, but has the consequence that it hides the actual type, which – when not anonymous – is a good way of seeing what happens behind the scenes.
David Klein gave an example for the multi column grouping and also shows that if you use LINQPad, you can actually see the IGrouping<TKey, TElement> in action.
Mike Taulty skipped the Group By Syntax for Grouping on Multiple Columns in his Grouping in LINQ is weird (IGrouping is your friend). So my examples include that.
Note that I don’t cover all the LINQ group by stuff, here, for instance, I skipped the into part.
There are some nice examples on MSDN covering exactly that both using Method based and Expression based LINQ.
The examples are based on these two classes, similar to what Mike did.
underlying classes
A Fruit
class:
namespace IGroupingConsoleApplication { public class Fruit { public string Type { get; private set; } public string Variety { get; private set; } public int Quantity { get; private set; } public decimal PricePerKilo { get; private set; } public Fruit(string type, string variety, int quantity, decimal pricePerKilo) { Type = type; Variety = variety; Quantity = quantity; PricePerKilo = pricePerKilo; } } }
A FruitVariety
class for use in the grouping (maybe I should call that a Cultivar class in stead as that is what it represents).
namespace IGroupingConsoleApplication { public class FruitVariety { public string Type { get; private set; } public string Variety { get; private set; } public FruitVariety(string type, string variety) { Type = type; Variety = variety; } } }
A Grocery
class that can Produce
some Dutch fruits.
Note the two lines with “Elstar” to make sure one of the groups contains 2 elements. Oh and they taste good too (:
using System.Collections.Generic; namespace IGroupingConsoleApplication { public class Grocerie { public static IEnumerable BuyFrom() { const string Apple = "Apple"; const string Pear = "Pear"; const string Cherry = "Cherry"; Fruit[] result = new Fruit[] { new Fruit(Apple, "Belle de Boskoop", 12, 2.48m), new Fruit(Apple, "Elstar", 7, 1.98m), new Fruit(Apple, "Elstar", 9, 1.88m), new Fruit(Apple, "Red Prince", 11, 2.98m), new Fruit(Apple, "Santana", 5, 2.48m), new Fruit(Pear, "Gieser Wildeman", 3, 3.48m), new Fruit(Pear, "Verdi", 6, 3.98m), new Fruit(Cherry, "Early Rivers", 30, 5.98m), new Fruit(Cherry, "Morel", 25, 6.98m), }; return result; } } }
When you look at the code examples, you will see that I favour IEnumerable and IEnumerable<T> (which extends IEnumerable) as method return types and parameters over List<T> or [] arrays (there is no Array<T> in .NET).
LINQ extends the classes in the System.Collections.Generic namespace. Those classes expose IEnumerable<T>.
The Enumerable class in the System.Linq namespace provides these extensions and is fully centered around IEnumerable<T>.Charlie Calvert explained this very well in the LINQFarm: Understanding IEnumerable entry of his LINQ / LINQFarm series:
The type IEnumerable<T> plays two key roles in this code.
- The query expression has a data source […] which implements IEnumerable<T>.
- The query expression returns an instance of IEnumerable<T>.
LINQ code
The rest of the code are the parts of the main program: different ways for formulating the LINQ:
- Using explicit (named) types
- with LINQ expressions
- with LINQ methods
- Using implicit (anonymous) types
- with LINQ expressions
- with LINQ methods
I’ll start with the explicit example, as that clearly shows the IGrouping<TKey, TElement>: that is the TKey
and what is the TElement
.
The LINQ expresions and LINQ methods are usually equivalent, and in these cases they are.
Explicitly typed LINQ code
So this LINQ code that groups by FruitVariety
using explicit types: Fruit
private static void showExplicitGrouping_LINQ_Expression(IEnumerable fruits) { IEnumerable<IGrouping<FruitVariety, Fruit>> groups = from fruit in fruits group fruit by new FruitVariety(fruit.Type, fruit.Variety); showExplicitFruitVarietyGrouping(groups); }
is equivalent to:
private static void showExplicitGrouping_LINQ_Method(IEnumerable fruits) { IEnumerable<IGrouping<FruitVariety, Fruit>> groups = fruits.GroupBy(fruit => new FruitVariety(fruit.Type, fruit.Variety)); showExplicitFruitVarietyGrouping(groups); }
Both group by 2 Fruit
columns: Type
and Variety
, as the FruitVariety
class implements those.
Because of the explicit (named) types, you can actually pass the groups
(of type IEnumerable<IGrouping<FruitVariety, Fruit>>
) to a method for postprocessing:
private static void showExplicitFruitVarietyGrouping(IEnumerable<IGrouping<FruitVariety, Fruit>> groups) { foreach (IGrouping<FruitVariety, Fruit> group in groups) { FruitVariety key = group.Key; Console.WriteLine("Type: {0}, Variety: {1}", key.Type, key.Variety); showFruits(group); } }
In the above method, you can see that the Key
in the group
is of type FruitVariety
.
This is because the group fruit by
and fruits.GroupBy
are both of type FruitVariety
.
Since IGrouping<FruitVariety, Fruit>
) extends IEnumerable<Fruit>
, you can pass each group
to the IEnumerable<Fruit> fruits
parameter of showFruits
to display the detail records by another foreach loop that uses the IEnumerator<Fruit> that fruits.GetEnumerator() provides:
private static void showFruits(IEnumerable fruits) { foreach (Fruit fruit in fruits) { Console.WriteLine(" Quantity: {0}, Price per Kilo: {1}", fruit.Quantity, fruit.PricePerKilo); } }
So: using explicit types has two advantages:
- You can see the actual underlying types used
- You can use explicit types through your whole code.
But sometimes you cannot use explicit types, mostly when you use anonymous types. So then var is the way to go.
Implicitly (var) typed LINQ code with anonymous types
Below is the LINQ code that is equivalent to the above, but then using var and anonymous types.
This code does not group by FruitVariety
, but with new { fruit.Type, fruit.Variety; }
.
You immediately see one of the drawbacks: you cannot have the foreach
loop outside this method without doing some serious refactoring (anonymous types are for temporal usage for a reason) or (heaven forbid) cast by example.
private static void showImplicitGrouping_LINQ_Expression(IEnumerable fruits) { var groups = from fruit in fruits group fruit by new { fruit.Type, fruit.Variety }; foreach (var group in groups) { var key = group.Key; Console.WriteLine("Type: {0}, Variety: {1}", key.Type, key.Variety); showFruits(group); } }
the above code is equivalent to:
private static void showImplicitGrouping_LINQ_Method(IEnumerable fruits) { var groups = fruits.GroupBy(fruit => new { fruit.Type, fruit.Variety }); foreach (var group in groups) { var key = group.Key; Console.WriteLine("Type: {0}, Variety: {1}", key.Type, key.Variety); showFruits(group); } }
You cannot put the forach loop in a separate method, as you cannot pass the var group
as a var
parameter in C# (in Oxygene, you could).
And even if you could, then the underlying type of the var
type would be invisible in the method, so you would not be able to know that key
in fact contains a Type
and Variety
field.
And parameterizing the method with generic types TKey
and TElement
this isn’t going to work either.
private static void showImplicitFruitVarietyGrouping<TKey, TElement>(IEnumerable<IGrouping<TKey, TElement>> groups) { foreach (IGrouping<TKey, TElement> group in groups) { TKey key = group.Key; // doesn't compile Console.WriteLine("Type: {0}, Variety: {1}", key.Type, key.Variety); // doesn't compile showFruits(group); } }
Anyway: the foreach
stays, but you can call showFruits
as the var
doesn’t mean it is untyped, it means it is implicitly typed, so it has a type, and from the group
type, the type of key
, can be deducted, as well that you can pass group
as an IEnumerable
parameter to showFruits
.
Output
Both pieces of code will output like below.
Which means you have two entries for Elstar, and a single entry for all other cultivars.
Type: Apple, Variety: Belle de Boskoop Quantity: 12, Price per Kilo: 2.48 Type: Apple, Variety: Elstar Quantity: 7, Price per Kilo: 1.98 Quantity: 9, Price per Kilo: 1.88 Type: Apple, Variety: Red Prince Quantity: 11, Price per Kilo: 2.98 Type: Apple, Variety: Santana Quantity: 5, Price per Kilo: 2.48 Type: Pear, Variety: Gieser Wildeman Quantity: 3, Price per Kilo: 3.48 Type: Pear, Variety: Verdi Quantity: 6, Price per Kilo: 3.98 Type: Cherry, Variety: Early Rivers Quantity: 30, Price per Kilo: 5.98 Type: Cherry, Variety: Morel Quantity: 25, Price per Kilo: 6.98
In a future blog post, I will show you some other aspects of grouping where you expect the same output as above, but the actual grouping is slightly different.
–jeroen
via:
Leave a Reply