Saturday, May 13, 2017

Microsoft 70-487: Query data by using LINQ to Entities

Exam Objectives

Query data by using LINQ operators (for example, project, skip, aggregate, filter, and join); log queries and database commands; implement query boundaries (IQueryable vs. IEnumerable); implement async query


Quick Overview of Training Materials



Query Data Using LINQ Operators


My post on using basic LINQ operators actually started here.  It seemed to be a big enough blurb to warrant it's very own post, so I refactored it.  While it was certainly relevant to this topic, I wanted this post to focus on the things that make working with LINQ and Entity Framework unique.

First off, not everything that you can do in LINQ is supported in LINQ to Entities.  The following LINQ operations are not supported when querying against an EF datasource:

  • Select, SelectMany, and Where with index parameter.  (sets lack implicit ordering)
  • Join, GroupJoin, GroupBy, Contains, Distinct, Except, Intersect, Union, OrderBy, OrderByDescending, ThenBy, ThenByDescending operations that use an IEqualityComparer or an IComparer (can't be translated to datasource)
  • Reverse is unsupported (no implicit ordering)
  • Concat is not guaranteed to preserve order
  • Aggregate is unsupported
  • Average, Max, Min, Sum overloads that accept a selector are unsupported (these are calculated in the datasource, which has no way of translating selector expression)
  • Several paging methods:  ElementAt, Last, SkipWhile, TakeWhile

It seems that the operators that are not supported pretty much fall into a couple categories:  operators that rely on implicit element ordering, and operators that require translating a complex C# object to the database.  The implicit ordering problem kills the indexed projection operators, Reverse, ElementAt, and Last.  Aggregator expressions, selector expressions, and comparators that would need to be applied to every element at the database level are a bust since there is no way to turn their functionality into a SQL statement.

The inability to use certain functions is a less obvious problem when using the query expression syntax, since it sort of obfuscates which query methods are actually being called.  The example below seems innocuous enough, calling a helper method that returns a bool in the where clause.  But when result is enumerated (not when it is created, mind you), it throws an exception:


It seems that much of the content for LINQ to Entities on MSDN (protip: if you want to see code samples, switch to the "Visual Studio 2010" version of the articles...) actually came from a book titled Accessing Data with Microsoft .NET Framework 4, particularly chapter 6.  Or maybe the book ripped off MSDN, who knows.  The exam ref then proceeds to do a poor job trying to distill this into two paragraphs and at least one code sample that won't compile.  Typical...

The book has a couple suggestions for dealing with the restricted capabilities of LINQ to Entities. One is to enumerate the results and then use standard LINQ on the result.  It rightly points out that this can result in much more data being processed on the client (i.e. not in the database).  His other suggestion is to create a stored procedure that will do what you want.  Soooo helpful...

I find it interesting that all these sources make such a big thing of the ObjectQuery<T> when this is basically an infrastructure detail.  When you write a LINQ query against an ObjectSet<T> (if you are using older EF with an ObjectContext), it will look like you are working against an IQueryable and/or an IEnumerable.  The actually underlying type is an ObjectQuery<T>, which implements both interfaces (plus a couple others).  Working against a DbContext really isn't all that different... you get a DbQuery instead of an ObjectQuery, but internally the DbQuery is using an ObjectQuery anyway:


The Exam Ref suggests that the DbContext based queries don't use LINQ to Entities, but I'm pretty sure that is incorrect.  Anyway, I'm probably getting way deeper into how the sausage is made than is really necessary.

There are some genuinely interesting properties of ObjectQuery and DbQuery that are worth exploring, though. ObjectQuery has a few of its own methods for building a query, such as UnionAll and Top, whereas DbQuery leans on LINQ for this functionality.  Both have an "Include" method, which will cause related items to be loaded and returned in the query results.  This eager loading can make certain queries much more efficient, like those that load and bind to a data table.  With lazy loading, related entities are pulled from the database one at a time, which is much slower than just returning everything together.

The last interesting difference with DbQuery is the "AsNoTracking" method, which returns entities without caching them in the DbContext.  Julie Lerman uses this in her demos for disconnected applications (web apps), since the context is generally used only for the life of the session and then disposed, so tracking the entities is a waste of resources.  



Log Queries and Database Commands


There are a couple ways you can log queries and see what SQL is actually being passed to the server. The Exam Ref was mildly helpful for this bit, since it demonstrated a couple of very direct ways to examine the queries.  With ObjectQuery, as long as you cast it to an ObjectQuery (if it's walking around as an IQueryable), you can call ToTraceString and it will spit out the SQL to be executed. DbQuery does this more naturally, with just a call to ToString:


public static void ToStringTests()
{
    var dbContext = new AdventureWorks2012Entities();
    ObjectContext context = ((IObjectContextAdapter)dbContext).ObjectContext;

    var custs = context.CreateObjectSet<Customer>()
            .Where(c => c.Person.FirstName.Length < 4);

    Console.WriteLine(custs.Count());

    var custs_oq = custs as ObjectQuery<Customer>;

    Console.WriteLine("\n\nTraceString:\n" + custs_oq.ToTraceString() + "\n\n");
    Console.WriteLine("\n\nToString:\n" + custs.ToString() + "\n\n");

    Console.WriteLine(dbContext.Customers
        .Where(c => c.Person.FirstName.Equals("Bob")));
}



If you are using only the ObjectQuery methods, instead of the LINQ extension methods, it will actually generate different SQL, which you can inspect with the CommandText property... but it's pretty weird:


public static void CommandTextTest()
{
    var dbContext = new AdventureWorks2012Entities();
    ObjectContext context = ((IObjectContextAdapter)dbContext).ObjectContext;

    Console.WriteLine("CommandText:\n" + context.CreateObjectSet<Customer>()
        .Top("10").Where("FirstName = Bob").CommandText);

}



Finally, the underlying Database object exposes a property called "Log".  If you want everything spat out onto the console, then you just assign Console.WriteLine as the Logger.  Not only does this display the SQL query, but also information about the connection and execution:


public static void LoggerTest()
{
    var dbContext = new AdventureWorks2012Entities();

    dbContext.Database.Log = Console.WriteLine;

    dbContext.Customers.Where(c => c.Person.FirstName == "Bob").Count();
}


Finally, if all else fails you can always run Sql Server Profiler and inspect the incoming queries.



Implement Query Boundaries


The Exam Ref authors were super lazy with this section, basically just listing a handful of sketchy differences between IEnumerable and IQueryable and leaving it at that.  There is no discussion whatsoever about how these two interfaces relate to query boundaries.  So I'll try to do a bit better job.

Firstly, lets look at just the difference in the way Queryable and Enumerable work under the covers. The StackOverflow question and Olexander Ivanitskyi's blog post on the difference went pretty deep.  The critical difference between the two is the fact that IQueryable builds an expression tree which it can then pass to the datasource provider, whereas IEnumerable is completely oblivious to what kind of datasource it is working with and just does operations in memory.  Olexander went as far as decompiling .NET code to peek at Microsoft's implementation, and when he ran two simple Where clauses against a million record table, the difference in performance was pretty astounding, with IEnumerable taking over 5 seconds, and IQueryable taking only 0.04 seconds.

One thing Julie Lerman cautions against in her PluralSight courses is the unnecessary use of ToList calls without understanding what is happening.  Calling ToList will perform an enumeration on the query that has been built to that point, materializing those entities into memory.  If related entities are subsequently accessed (such as through a navigation property), you can end up triggering another database call for each record... whereas if you keep everything in one query (one chain of deferred execution IQueryable operations), then everything happens in the database and you make just the one call.

It's worth noting that both IQueryable and IEnumerable support deferred execution, this is not unique to Queryables.  Also, both interfaces can be used with any LINQ provider (the book seems to think there is some kind of segregation involved).  IQueryable still builds an expression tree and executes the same way even when the underlying datasource is an in memory list.  I borrowed some code from Julies EF in the Enterprise course to illustrate what how an expression might be built manually (functionally equivalent to providing a lambda expression directly):


static void Main(string[] args)
{
    //this is pretty much straight out of Julie Lerman's PluralSight course
    //"Entity Framework in the Enterprise", module 4, clip 4 - "Non-tracking 
    //Alternatives to a Generic DbSet.Find"
    var item = Expression.Parameter(typeof(string), "s");
    var prop = Expression.Property(item, "Length");
    var value = Expression.Constant(4);
    var gt = Expression.GreaterThan(prop, value);
    var lambda = Expression.Lambda<Func<string, bool>>(gt, item);
    Console.WriteLine(lambda);


    var fruit = new List<string>() {"pear", "banana", "kiwi", "strawberry" };

    Console.WriteLine(fruit.AsQueryable().Where(lambda));
    Console.WriteLine(fruit.AsQueryable().Where(f => f.Length > 4));
    Console.WriteLine(fruit.AsEnumerable().Where(f => f.Length > 4));

    Console.Read();
}


To circle back around to how this relates to boundaries:  An IQueryable will keep augmenting an existing expression tree until results are materialized as the result of calling some operation that enumerates the results.  It is important to be mindful of these boundaries, because if you have to go back for data after crossing one, you can pay quite dearly for it:


static void Boundaries()
{
    var context = new AdventureWorks2012Entities();
    context.Database.Log = notch;
    var custs = context.Customers;


    var query = custs
        .Where(c => c.Person.FirstName == "Bob")
        .OrderBy(c => c.rowguid);

    Console.WriteLine(query);
    Console.WriteLine(query.Count());
    Console.WriteLine("*****  END QUERYABLE  *****\n\n\n");

    var bad = custs.ToList().AsQueryable()
        .Where(c => c.Person != null && "Bob".Equals(c.Person.FirstName))
        .OrderBy(c => c.rowguid);

    Console.WriteLine(bad);
    Console.WriteLine(bad.Count());
    Console.WriteLine("*****  END ENUMERABLE (FINALLY...)  *****\n\n\n");
}

static int counter = 0;
static void notch(object value)
{
    if(counter % 100 == 0)
        Console.Write("x");
    counter++;
}

The result of this little experiment were staggering.  There are about 20,000 customer records in the AdventureWorks database.  Each "x" represents one hundred logging events.  One query generates about six or seven.  I had to switch to the "x" business because logging everything to Console.WriteLine was super ridiculous...


20 rows time 80 columns times 100 = 160,000 loggin calls... yikes!  Needless to say, the top bit of code that kept everything in one query was much, much, much faster.



Implement Async Queries


My first attempt at writing some asynchronous code samples to accompany this section was a complete disaster.  I found out very quickly that if you don't know exactly what you are doing with async code it is very, very easy to jack it up royally.  So after watching some of the async and parallel programming in .NET courses by Joe Hummel on PluralSight, I finally got a functional and correct demo app, even if it isn't terribly realistic (you could do what the code does with a single query I'm pretty sure, though it would be a fairly involved query).

My first pass was super simple: do a database query to count the number of customers with ID's ending in each number.  It wasn't terribly interesting because apparently the numbers are uniformly distributed.  A slightly more interesting series of queries was next, looking at the first letter of first names and counting the results.  In both cases, the database query is enclosed in a Task using the static Task.Factory.StartNew method.  One of my initial mistakes was to create the Task directly and then not start it running.

The other big mistake I made initially was the gross misuse of "await".  I kept using it thinking it would block until I had a result, but the opposite is true.  "await" basically signals to the program that you are waiting and returns flow control to the calling code.  So I would call "await" for a result and find the Console.WriteLine from Main saying everything was done would print on the screen.... oops.

Here is the example code counting up first name letters.  While this example is pretty contrived, I can imagine a scenario where I am querying disparate data sources (maybe a couple different databases and a web service, for example) and I want to collate some of the data.  I could start queries to all of them at once, and they would execute in parallel.


static void AsyncDemo2()
{
    var tasks = new List<Task<int>>();
    for (int i = 0; i < 26; i++)
    {
        string firstLetter = ((char)((int)'A' + i)).ToString();
        tasks.Add(Task.Factory.StartNew<int>(() =>
        {
            using (var context = new AdventureWorks2012Entities())
            {
                //context.Database.Log = Console.WriteLine;
                Console.WriteLine("Starting task " + firstLetter);
                int result = context.People
                    .Where(p => p.FirstName.StartsWith(firstLetter))
                    .Count();
                Console.WriteLine("Finished computing task " + firstLetter);
                return result;
            }
        }));
    }

    var results = new string[26];
    for (int i = 0; i < 26; i++)
    {
        string firstLetter = ((char)((int)'A' + i)).ToString();
        results[i] = firstLetter + ": " + tasks[i].Result;
    }

    Console.WriteLine("[" + String.Join(", ", results) + "]");
}


Asynchronous LINQ is a pretty deep rabbit hole.  A good place to start for a deeper dive may be this question on StackOverflow: How to Write Asynchronous LINQ Query.  There are some code samples, numerous comments criticizing those samples, and links to blog posts that attempt to address these shortcomings.

A similar thread on forums.asp.net titled Running LINQ to SQL query async? actually got me pointed to the EF6 functionality that supports asynchronous operations natively.  The QueryableExtentions class exposes a bunch of Async suffix versions of operations such as Average, Min, Max, Load, Single, First, and so on.  In fact I think Average may have something like 100 overloads =S.

I decided to play around with these a little bit, and managed to come up with something that replicated my previous blurb.  The tricky part with using CountAsync() in my case, was dealing with the DbContext.  If I tried to add the Task<int> returned by CountAsync() directly to my list of tasks, then the context would be closed before I got around to calling Result.  If I wrapped the whole thing in a single task, not only did it make the example fundamentally more stupid (because it's making all these calls against the same context instance)... it broke at runtime, basically telling me I need to use "await".  Luckily, it didn't take much fumbling on StackOverflow to find a useful example and whip this example into shape:

static void AsyncDemo3()
{
    var tasks = new List<Task<int>>();
    for (int i = 0; i < 26; i++)
    {
        string firstLetter = ((char)((int)'A' + i)).ToString();
        tasks.Add(Task<int>.Run(async () =>
        {
            using (var context = new AdventureWorks2012Entities())
            {
                Console.WriteLine("Starting task " + firstLetter);
                int result = await context.People
                    .Where(p => p.FirstName.StartsWith(firstLetter))
                    .CountAsync();
                Console.WriteLine("Finished computing task " + firstLetter);
                return result;
            }
        }));
    }

    var results = new string[26];
    for (int i = 0; i < 26; i++)
    {
        string firstLetter = ((char)((int)'A' + i)).ToString();
        results[i] = firstLetter + ": " + tasks[i].Result;
    }

    Console.WriteLine("[" + String.Join(", ", results) + "]");
}

At first the results looked suspiciously synchronous, but after putting a Thread.Sleep(500) just after enqueuing the task, I started to see the interleaved Console.WriteLine calls I expected.  So it seems this code is in fact running asyncronously.  Whew...

2 comments:

  1. Hi,

    Very good studying guides, thanks.

    As for async in EF6, I have found a shorter approach here: https://msdn.microsoft.com/en-us/library/jj819165(v=vs.113).aspx

    With that in mind, I have adapted your example for my local database schema for Northwind db (with "Customers" and "ContactName") and it worked. Here is what it should look like in your case:

    static async Task AsyncDemo4()
    {
    using (var context = new AdventureWorks2012Entities())
    {
    for (int i = 0; i < 26; i++)
    {
    string firstLetter = ((char)((int)'A' + i)).ToString();

    int result = await context.People
    .Where(p => p.FirstName.StartsWith(firstLetter))
    .CountAsync();

    Console.WriteLine(firstLetter + ": " + result);
    }
    }
    }

    You can call it from the Main method this way:

    var task = AsyncDemo4();
    task.Wait();

    Hope, it helps :)

    regards, Przemyslaw Soszynski

    ReplyDelete
    Replies
    1. Thanks! I like the async example, I'll have to drop it into my code when I get back home and play around with it, it's definitely much more succinct than my example. It's a shame the comments munge up the formatting lol.

      Delete