Sunday, May 7, 2017

LINQ Basics by Example

While working on my 70-487 study guide I wrote up a looong section on basic LINQ operators that, while certainly germane, was not really the subject at hand.  So I'm refactoring that stuff into it's own post.  I look at the LINQ extension methods provided by Enumerable and Queryable (basically the same operations, just different underlying behavior), with lots of example code and console screenshots.  All the code samples can be found on my Github page for the study guide examples.
My notion of "categories" of operations was really just an organizational convenience, otherwise it was just this huge list of operations with no clear relationship to each other.  If some operations didn't fit neatly into a group... well I just sort of winged it.  Sue me.

Links in the lists of operations are to the Enumerable versions.  I've noted which methods use deferred execution, which is basically a way of waiting to enumerate a sequence and perform operations on it until absolutely necessary.  In many cases, deferred execution can save work and speed up operations (but it can bite you if you aren't aware of it).


Numeric Operations


These operations perform mathematical functions on a sequence of numbers (well, except for Range, which generates a sequence of integers).  These should be pretty self explanatory to anyone with a basic math competence.  There are something like 20 overloads for all except Range, to deal with each kind of number type (Int32, Int64, Decimal, Double, Single), the nullable versions of each, and variants that take a transformation function.  The varieties that take a transform function apply the transform to the sequence, then operate on the results of the transform (so if you have a sequence of non-numbers, you can spit out the number you want to sum, average, whatever, from a field or calculation).
  • Average - computes average or numeric sequence
  • Max - computes max of numeric sequence
  • Min - computes min of numeric sequence
  • Range - generates a sequence of integers in range
  • Sum - computes the sum of a numeric sequence

static void Numeric()
{
    var sequence = new List<int>(){368, 506,  90, 340, 325, 
                                   635, 705, 759, 599,  79};

    var range = Enumerable.Range(0, 10);

    Console.WriteLine("Range(0, 10): " + PrettyPrint(range));
    Console.WriteLine("\n\nSequence: " + PrettyPrint(sequence));
    Console.WriteLine("     Max: " + sequence.Max());
    Console.WriteLine("     Min: " + sequence.Min());
    Console.WriteLine("    Mean: " + sequence.Average());
    Console.WriteLine("     Sum: " + sequence.Sum());

}





Sorting and Navigating 


These operations change the order of the sequence (either sorting or reversing), or bypass elements in the sequence.  I probably could have thrown "Take" in here instead of with Selection, but at the time it made sense and I'm not rejiggering all the examples, so there it is.  These all use deferred execution.

static void Sorting()
{
    var sequence = new List<Person>(){
        new Person("Bobby Tables", new DateTime(1982, 1, 1)),
        new Person("Susan Strong", new DateTime(1977, 8, 22)),
        new Person("Multiple Man", new DateTime(2005, 4, 30)),
        new Person("Multiple Man", new DateTime(2015, 1, 5)),
        new Person("Multiple Man", new DateTime(1995, 11, 16)),
        new Person("Multiple Man", new DateTime(1962, 9, 8)),
        new Person("Finn the Human", new DateTime(2001, 12, 5)),
        new Person("Steve Bobfish", new DateTime(2014, 6, 14))
    };

    Console.WriteLine("Base sequence:\n" + PrettyPrint(sequence, true));

    var sorted = sequence.OrderBy(p => p.Name).ThenByDescending(p => p.Birthday);
    Console.WriteLine("\nSorted by Name, then Birthday:\n" + PrettyPrint(sorted, true));

    var rev = sorted.Reverse();
    Console.WriteLine("\nReverse of sorted list:\n" + PrettyPrint(rev, true));

    var skipped = rev.Skip(2);
    Console.WriteLine("\nSkip first two of reversed:\n" + PrettyPrint(skipped, true));

    var skipMulti = skipped.SkipWhile(p => p.Name.Equals("Multiple Man"));
    Console.WriteLine("\nSkip 'Multiple Man' persons:\n" + PrettyPrint(skipMulti, true));

}





Set operations


These operations impliment ideas from set theory.  These all use deferred execution.
  • Except - returns the set difference of two sequences
  • Intersect - set intersection of two sequences
  • Union - produces set union of two sequences

static void SetOps()
{
    var seq1 = "mustache".ToCharArray();
    var seq2 = "mustard".ToCharArray();
    var seq3 = "screwdriver".ToCharArray();

    Console.WriteLine("Sequence 1: " + PrettyPrint(seq1));
    Console.WriteLine("Sequence 2: " + PrettyPrint(seq2));
    Console.WriteLine("Sequence 3: " + PrettyPrint(seq3));
    Console.WriteLine("\n");

    Console.WriteLine("1 - 2: " + PrettyPrint(seq1.Except(seq2)));
    Console.WriteLine("2 - 1: " + PrettyPrint(seq2.Except(seq1)));
    Console.WriteLine("1 - 3: " + PrettyPrint(seq1.Except(seq3)));
    Console.WriteLine("2 - 3: " + PrettyPrint(seq2.Except(seq3)));
    Console.WriteLine("\n");

    Console.WriteLine("1 ∩ 2: " + PrettyPrint(seq1.Intersect(seq2)));
    Console.WriteLine("1 ∩ 3: " + PrettyPrint(seq1.Intersect(seq3)));
    Console.WriteLine("2 ∩ 3: " + PrettyPrint(seq2.Intersect(seq3)));
    Console.WriteLine("\n");

    Console.WriteLine("1 U 2: " + PrettyPrint(seq1.Union(seq2)));
    Console.WriteLine("1 U 3: " + PrettyPrint(seq1.Union(seq3)));
    Console.WriteLine("2 U 3: " + PrettyPrint(seq2.Union(seq3)));
    Console.WriteLine("\n");
}




Sequence initialization


These two... honestly I just wasn't sure how else to categorize them.  Empty initializes an empty sequence, and Repeat initializes a sequence with a single value repeated x times.  Neither is an extension method, they are just static methods on Enumerable. 
  • Empty - returns an empty sequence of the given type
  • Repeat - generates a sequence containing one repeated element

static void InitSeq()
{
    var repeat = Enumerable.Repeat("A", 5);
    var empty = Enumerable.Empty<String>();

    Console.WriteLine("Repeat: " + PrettyPrint(repeat));
    Console.WriteLine("Empty: " + PrettyPrint(empty));
}




Boolean Operations


There are a handful of operations that test the sequence and return a boolean value.  These operations don't use deferred execution (how could they?), though they do short circuit as soon as a stop condition is met (true result for Any or Contains, false result for All and SequenceEqual).  
  • All - returns true if every element matches criteria
  • Any - returns true if any element matches criteria
  • Contains - returns true if a specific element exists in a sequence
  • SequenceEqual - compares two sequences element by element.  Order matters (so {1,2,3} != {1,3,2}). 

static void Booleans()
{
    var first = "The quick brown fox".Split(' ');
    var second = "jumped over the".Split(' ');
    var third = "lazy dog".Split(' ');

    Console.WriteLine("Part 1: " + PrettyPrint(first));
    Console.WriteLine("Part 2: " + PrettyPrint(second));
    Console.WriteLine("Part 3: " + PrettyPrint(third));
    Console.WriteLine("\n");
    Console.WriteLine("                       |  Part 1  |  Part 2  |  Part 3  |");
    Console.WriteLine("All words have 'e'?    |" + 
            Cell(first.All(w => w.Contains('e'))) + "|" +
            Cell(second.All(w => w.Contains('e')))+ "|" +
            Cell(third.All(w => w.Contains('e'))) + "|");
    Console.WriteLine("Any words have 'z'?    |" + 
            Cell(first.Any(w => w.Contains('z'))) + "|" +
            Cell(second.Any(w => w.Contains('z'))) + "|" +
            Cell(third.Any(w => w.Contains('z'))) + "|");
    Console.WriteLine("Contains 'the'?        |" + 
            Cell(first.Contains("the")) + "|" +
            Cell(second.Contains("the")) + "|" +
            Cell(third.Contains("the")) + "|");
    Console.WriteLine("Seq Equals 'lazy dog?' |" + 
            Cell(first.SequenceEqual(new List<string>(){"lazy","dog"})) + "|" +
            Cell(second.SequenceEqual(new List<string>(){"lazy","dog"})) + "|" +
            Cell(third.SequenceEqual(new List<string>(){"lazy","dog"})) + "|");
    Console.WriteLine("Naive == 'lazy dog?'   |" +
            Cell(first.Equals(new List<string>() { "lazy", "dog" })) + "|" +
            Cell(second.Equals(new List<string>() { "lazy", "dog" })) + "|" +
            Cell(third.Equals(new List<string>() { "lazy", "dog" })) + "|");

}




Selection Operations


What sets many of these operations apart is that they are basically designed to return a result (well on reflection Take is more a navigation operation, ce la vie).  Take, TakeWhile, and DefaultIfEmpty all defer execution, while the rest trigger enumeration.  The "OrDefault" versions of ElementAt, First, Last, and Single all return a default value when they would otherwise fail (calling First on an empty sequence, for example).  The default for reference types is null, so that is usually what you'll get. Numeric types default to 0, and bools to False.

static void Selection()
{
    var seq = Enumerable.Range(1, 10);
    Console.WriteLine("     Sequence: " + PrettyPrint(seq));
    Console.WriteLine("# of elements: " + seq.Count());
    Console.WriteLine("   # of evens: " + seq.Count(e => e % 2 == 0));
    Console.WriteLine("first element: " + seq.First());
    Console.WriteLine(" last element: " + seq.Last());
    Console.WriteLine("  3rd element: " + seq.ElementAt(2));
    Console.WriteLine(" 99th element: " + seq.ElementAtOrDefault(98));
    Console.WriteLine("      first 3: " + PrettyPrint(seq.Take(3)));
    Console.WriteLine("    first < 6: " + PrettyPrint(seq.TakeWhile(e => e < 6)));

}





Filtering Operations


These are operations designed to take an input sequence and make it smaller by removing elements that don't meet some criteria.  Distinct requires elements be unique, OfType requires elements be a certain type, and Where lets you specify a predicate function to figure it out.  These all use deferred execution.
  • Distinct - removes duplicate elements
  • OfType - filters elements based on the specific type.  Also seems to simultaneously cast elements to the filtered type.
  • Where - filters based on criteria

static void Filters()
{
    var people = new List<Person>(){
        new SuperHero("Iron Man", "Tony Stark", new DateTime(1970, 5, 29)),
        new SuperHero("Batman", "Bruce Wayne", new DateTime(1940, 4, 25)),
        new Person("Jake the Dog", new DateTime(1995, 5, 30)),
        new Person("Marcelline Anthonsen", new DateTime(1984, 1, 27)),
        new Person("Marcelline Anthonsen", new DateTime(1984, 1, 27)),
        new Person("Steve Bobfish III", new DateTime(2017, 4, 30))
    };

    Console.WriteLine("Population:\n" + PrettyPrint(people, true));

    var deduped = people.Distinct();
    Console.WriteLine("\nNo duplicates:\n" + PrettyPrint(deduped, true));

    var heros = people.OfType<SuperHero>();
    Console.WriteLine("\nOnly superheros:\n" + PrettyPrint(heros, true));

    var heroSeq = people.OfType<SuperHero>()
        .Cast<SuperHero>().Select(h => h.HeroName);
    var heroSeq2 = people.OfType<SuperHero>().Select(h => h.HeroName);
    Console.WriteLine("\nOnly hero names: " + PrettyPrint(heroSeq));
    Console.WriteLine("      No 'Cast': " + PrettyPrint(heroSeq2));

    var fancyNames = people.Where(p => p.Name.Split(' ').Length >= 3);
    Console.WriteLine("\nFancy names:\n" + PrettyPrint(fancyNames, true));
}





Transformation Operations


These operations are generally used to change the shape of the sequence in some way.  Aggregate condenses the sequence down to a single result (versus just picking one element).  Cast changes the type (and throws an exception if it can't).  Concat glues two sequences together.  Select and Zip can (probably generally do) produce sequences with a different underlying element type (even an anonymous type), while SelectMany just squashes nested sequences into a single sequence.  All operations except Aggregate use deferred execution.
  • Aggregate - applies an accumulator over a sequence (similar to a "reduce")
  • Cast<TResult> - casts the elements of sequence to TResults
  • Concat - concatenate two sequences
  • Select - project the sequence elements into a new form
  • SelectMany - flattens a sequence of sequences
  • Zip - applies function to two sequences, outputs third sequence

static void Transform()
{
    var nums = Enumerable.Range(1, 10);
    var abcs = "ABCDEFGHIJ".ToCharArray();

    Console.WriteLine("Numbers: " + PrettyPrint(nums));
    Console.WriteLine("Letters: " + PrettyPrint(abcs));

    var joined = abcs.Take(4)
        .Aggregate("", (last, e) => last += last + e.ToString());
    Console.WriteLine("\nAggregate: " + joined);

    var zipped = nums.Zip(abcs, (n, l) => n.ToString() + l.ToString());
    Console.WriteLine("\nZipped: " + PrettyPrint(zipped));

    var concat = nums.Take(5).Concat(nums.Take(5));
    Console.WriteLine("\nConcat: " + PrettyPrint(concat));

    var listlist = new List<List<string>>(){
        new List<string>(){"The", "quick", "brown" },
        new List<string>(){"fox", "jumped", "over" },
        new List<string>(){"the", "lazy", "dog" }
    };

    Console.WriteLine("\n\n*** LIST OF LISTS ***");
    Console.WriteLine("\nPre-Flattened: \n" + PrettyPrint(listlist, true));
    Console.WriteLine("\nFlattened: \n" + PrettyPrint(listlist.SelectMany(s => s)));
}




Grouping and Joining


These methods provide functionality similar to the SQL concepts of GROUP BY and JOIN (inner join really).  While a left outer join is possible (thanks to StackOverflow for the answer), it is something you have to get creative and build by hand, it doesn't come baked in.  These all use deferred execution.
  • GroupBy - groups elements according to key selector
  • GroupJoin - correlates elements based on key selector
  • Join - correlates elements based on matching keys.  This is equivalent to an inner equijoin out of the box.


static void JoinGroup()
{
    var heroTypes = new List<HeroType>(){
        new HeroType(1, "Hero"),
        new HeroType(2, "Anti-hero"),
        new HeroType(3, "Villian")
    };

    var characters = new List<SuperHero>(){
        new SuperHero("Iron Man", 1, "Tony Stark", new DateTime()),
        new SuperHero("Superman", 1, "Clark Kent", new DateTime()),
        new SuperHero("Wolverine", 1, "Logan ???", new DateTime()),
        new SuperHero("Deadpool", 2, "Wade Wilson", new DateTime()),
        new SuperHero("Punisher", 2, "Frank Castle", new DateTime()),
        new SuperHero("Loki", 3, "Loki Laufeyson", new DateTime()),
        new SuperHero("Joker", 3, "Jack Napier", new DateTime()),
        new SuperHero("Ozymandias", 4, "Adrian Veidt", new DateTime()),
    };

    Console.WriteLine("Hero Types: " + PrettyPrint(heroTypes, true));
    Console.WriteLine("\nCharacters: " + PrettyPrint(characters, true));

    var grouped = characters.GroupBy(k => k.HeroTypeId)
        .Select(g => g.Key.ToString() + ": " + g.Count().ToString());
    Console.WriteLine("\nGrouped: " + PrettyPrint(grouped, true));

    var gjoined = heroTypes.GroupJoin(
         characters,
         ht => ht.Id,
         h => h.HeroTypeId,
         (ht, h) => new { ht.TypeName, Heros = h })
        .Select(g => g.TypeName + ": " + PrettyPrint(g.Heros.Select(h => h.HeroName)));
    Console.WriteLine("\nGroupJoined: " + PrettyPrint(gjoined, true));

    var join = characters.Join(heroTypes,
            ch => ch.HeroTypeId,
            ht => ht.Id,
            (ch, ht) => new { ch.HeroName, ht.TypeName })
            .Select(r => r.HeroName + ": " + r.TypeName);

    Console.WriteLine("\nJoined: " + PrettyPrint(join, true));

    var leftjoin = characters
        .GroupJoin(
            heroTypes,
            hero => hero.HeroTypeId,
            ht => ht.Id,
            (hero, hts) => new { hero, hts })
        .SelectMany(
            xy => xy.hts.DefaultIfEmpty(new HeroType(0, "???")),
            (x, y) => new { HeroName = x.hero.HeroName, Allignment = y.TypeName })
        .Select(
            s => s.HeroName + ": " + s.Allignment);

    Console.WriteLine("\nLeft Join: " + PrettyPrint(leftjoin, true));
}


No comments:

Post a Comment