Monday, April 24, 2017

Microsoft 70-487: Create an Entity Framework data model

Exam Objectives

Structure the data model using table per type, table per class, table per hierarchy; choose and implement an approach to manage a data model (code first vs. model first vs. database first); implement POCO objects; describe a data model by using conceptual schema definitions, storage schema definition, mapping language (CSDL, SSDL, MSL), and Custom Code First Conventions


Entity framework gives us mechanisms to control the structure of the tables representing our class inheritance hierarchies, though it isn't exactly like you flip on a flag and it does it a certain way.  The tutorial series on the asp.net blog (which the Datatell post basically reproduces) presents three ways of doing this:  one table per hierarchy (basically the default), one table per type, and one table per concrete class.  I'll walk through the asp.net version, starting with the domain models:


namespace DomainModels
{
    public abstract class BillingDetail
    {
        public int BillingDetailId { get; set; }
        public string Owner { get; set; }
        public string Number { get; set; }
    }

    public class CreditCard : BillingDetail
    {
        public int CardType { get; set; }
        public string ExpiryMonth { get; set; }
        public string ExpiryYear { get; set; }
    }

    public class BankAccount : BillingDetail
    {
        public string BankName { get; set; }
        public string Swift { get; set; }
    }
}


I actually broke each class out into separate files in my example code, but here it's cleaner to show them all together.  For each of the three approaches, I created a separate example project, which effectively gave them each their own database using the default EF setting, which made for a nice clean divide.



Type Per Hierarchy


This style requires the least work.  Basically, if you define a DbSet of the parent abstract type, and do nothing else, this is the kind of table you will get.  Your class hierarchy is flattened out into a single table, with columns for each unique column.  The definition, migration, and table diagram:


    public class TPHContext : DbContext
    {
        public DbSet<BillingDetail> BillingDetails { get; set; }
    }

    public partial class initial_db_create : DbMigration
    {
        public override void Up()
        {
            CreateTable(
                "dbo.BillingDetails",
                c => new
                    {
                        BillingDetailId = c.Int(nullable: false, identity: true),
                        Owner = c.String(),
                        Number = c.String(),
                        BankName = c.String(),
                        Swift = c.String(),
                        CardType = c.Int(),
                        ExpiryMonth = c.String(),
                        ExpiryYear = c.String(),
                        Discriminator = c.String(nullable: false, maxLength: 128),
                    })
                .PrimaryKey(t => t.BillingDetailId);
            
        }
        
        public override void Down()
        {
            DropTable("dbo.BillingDetails");
        }
    }


The DbContext class is stupid simple, there really isn't any more to it than that.  The resulting EF migration creates the one table, which we see in the diagram.  Incidentally, they took the diagramming tools out of Visual Studio somewhere around VS 2013, so I had to fire up SSMS to generate the diagram.



Table Per Type


There are a couple ways we can do table per type.  The first is to put an annotation on our domain objects, like [Table("BankAccounts")] on the BankAccounts class.  Personally I don't really like this approach because we are adding a database representation concern to our business objects, which should be independent of that (at least in my mind).  Fortunatly, we can avoid these annotation by using the fluent notation in the DbContext, which to me makes much more sense.  I think how the data is represented in the database is more properly a concern of the class talking to the database (but that's my personally opinion).  Here is the definition (using fluent syntax), the migration, and the diagram:


    public class TBTContext : DbContext
    {
        public DbSet<BillingDetail> BillingDetails { get; set; }

        protected override void OnModelCreating(DbModelBuilder modelBuilder)
        {
            modelBuilder.Entity<BankAccount>().ToTable("BankAccounts");
            modelBuilder.Entity<CreditCard>().ToTable("CreditCards");
        }
    }


   public partial class initial_db_create : DbMigration
    {
        public override void Up()
        {
            CreateTable(
                "dbo.BillingDetails",
                c => new
                    {
                        BillingDetailId = c.Int(nullable: false, identity: true),
                        Owner = c.String(),
                        Number = c.String(),
                    })
                .PrimaryKey(t => t.BillingDetailId);
            
            CreateTable(
                "dbo.BankAccounts",
                c => new
                    {
                        BillingDetailId = c.Int(nullable: false),
                        BankName = c.String(),
                        Swift = c.String(),
                    })
                .PrimaryKey(t => t.BillingDetailId)
                .ForeignKey("dbo.BillingDetails", t => t.BillingDetailId)
                .Index(t => t.BillingDetailId);
            
            CreateTable(
                "dbo.CreditCards",
                c => new
                    {
                        BillingDetailId = c.Int(nullable: false),
                        CardType = c.Int(nullable: false),
                        ExpiryMonth = c.String(),
                        ExpiryYear = c.String(),
                    })
                .PrimaryKey(t => t.BillingDetailId)
                .ForeignKey("dbo.BillingDetails", t => t.BillingDetailId)
                .Index(t => t.BillingDetailId);
            
        }
        
        public override void Down()
        {
            DropForeignKey("dbo.CreditCards", "BillingDetailId", "dbo.BillingDetails");
            DropForeignKey("dbo.BankAccounts", "BillingDetailId", "dbo.BillingDetails");
            DropIndex("dbo.CreditCards", new[] { "BillingDetailId" });
            DropIndex("dbo.BankAccounts", new[] { "BillingDetailId" });
            DropTable("dbo.CreditCards");
            DropTable("dbo.BankAccounts");
            DropTable("dbo.BillingDetails");
        }
    }


The DbContext isn't much different, but the migration is considerably more complicated.  We can see in the table diagram (and the migration) that all three classes got a table, and the two subclasses have a foreign key relationship with their abstract parent class table.  This is the most normalized of the three approaches; notice that CreditCards and BankAccounts share no columns except the FK field.  And yeah... it should be TPT not TBT... derp...



Table Per (Concrete) Class


The last approach is unique to code first (as it requires the use of fluent API).  In this approach, each concrete type is mapped to a separate table with no relation to other tables.  Columns from the abstract parent class will be duplicated in each child class, so it is sort of half-flattened.

One gotcha pointed out in the both blog posts was the fact that, because the items in the DbContext live in two different tables, but the same DbSet, it is possible to run into key conflicts in the application.  We don't see this in table per type because of the foreign key relationship with BillingDetails (all the unique keys are generated in one table).  Datatells solution was to switch the BillingDetailId from an int to a guid, whereas the asp.net blog turns off database generation (again, using either an annotation or the fluent API) and handles ids in the application.

I handled the problem by overriding SaveChanges.  Basically, it looks up the current maximum id, and gives all the added entities a new sequential id starting at max + 1.  I got the idea from Julie Lerman's stuff, I just adapted it to this situation.  It's fine for demoware but I would be real skeptical of putting it in production (I could see there being possible concurrency issues).  Here is the context, the migration, and table diagram:


    public class TPCContext : DbContext
    {
        public DbSet<BillingDetail> BillingDetails { get; set; }

        //always write logs to console for demo purposes
        public TPCContext()
        {
            this.Database.Log = Console.WriteLine;
        }

        protected override void OnModelCreating(DbModelBuilder modelBuilder)
        {
            modelBuilder.Entity<BankAccount>().Map(m =>
            {
                m.MapInheritedProperties();
                m.ToTable("BankAccounts");
            });

            modelBuilder.Entity<CreditCard>().Map(m =>
            {
                m.MapInheritedProperties();
                m.ToTable("CreditCards");
            });

            modelBuilder.Entity<BillingDetail>()
            .Property(p => p.BillingDetailId)
            .HasDatabaseGeneratedOption(DatabaseGeneratedOption.None);
        }

        //you'd probably want to use some transaction logic in a real setting...
        public override int SaveChanges(){

            //get the maximum id, credit to http://stackoverflow.com/a/30257987
            int lastId = this.BillingDetails.DefaultIfEmpty()
                             .Max(r => r == null ? 0 : r.BillingDetailId);
            int nextId = lastId + 1;

            //grab each added item, give it next id, increment id
            foreach (var billingDetail in this.ChangeTracker.Entries()
                .Where(e => e.State == EntityState.Added)
                .Select(e => e.Entity as BillingDetail))
            {
                billingDetail.BillingDetailId = nextId;
                nextId++;
            }

            return base.SaveChanges();
        }
    }


    public partial class initial_db_create : DbMigration
    {
        public override void Up()
        {
            CreateTable(
                "dbo.BankAccounts",
                c => new
                    {
                        BillingDetailId = c.Int(nullable: false),
                        Owner = c.String(),
                        Number = c.String(),
                        BankName = c.String(),
                        Swift = c.String(),
                    })
                .PrimaryKey(t => t.BillingDetailId);
            
            CreateTable(
                "dbo.CreditCards",
                c => new
                    {
                        BillingDetailId = c.Int(nullable: false),
                        Owner = c.String(),
                        Number = c.String(),
                        CardType = c.Int(nullable: false),
                        ExpiryMonth = c.String(),
                        ExpiryYear = c.String(),
                    })
                .PrimaryKey(t => t.BillingDetailId);
            
        }
        
        public override void Down()
        {
            DropTable("dbo.CreditCards");
            DropTable("dbo.BankAccounts");
        }
    }


Notice how CreditCards and BankAccounts both have the Owner and Number columns now, whereas above, in the Table per Type, these columns were contained in the BillingDetails table.

The takeaway from this section is that the way in which tables are created and mapped to the data models is controlled by the DbContext.  The configuration to make this happen is not obvious or intuitive, and as we saw with the primary key collisions in the table-per-class approach, the choice can sometimes lead to unintended side effects.



Which came first, the Code or the Database?


There are three ways to wire up EF with the database:

  • Code First - design your domain models and your DbContext, and run migrations to create and update the database.
  • Model First - create an edmx model, and generate the entity classes and database from the model
  • Database First - use an existing database to create the edmx and entity classes.
Julie Lerman's "Getting Started with Entity Framework 6" course on PluralSight has a 4 minute section called "Visual or Code Based Models from Scratch or Existing Database" in the first module which really sums up the big picture really well.  It kind of lays out a hierarchy of bits that go into creating the overall EF plumbing in our application.  

At the top of this heap is the in-memory representation used by EF to map our classes to the database.  This can come from one of two places: an xml representation of the entity framework datamodel (i.e. edmx), or from code.  The edmx is presented as a visual representation of the data model, whereas the code form is just a class (DbContext subclass).  

The edmx can be created in one of two ways.  It is possible to add an empty edmx into a project and edit it directly through the designer included with Visual Studio (Model first).  It is also possible to generate the edmx from an existing database (Database first).  

Alternatively, we can create a purely code based entity model by using the DbContext class directly.  While it is possible to generate these classes from an existing database, it's a one shot generation, you can't update these classes based on changes to the database.  It is possible to go the other way, updating a database based on changes to the classes (a process called "migrations").  

If you've ever used Active Record in Ruby on Rails, then EF migrations will feel very familiar.  In the package manager console in Visual Studio, first enable migrations on a project (command is "enable-migrations").  When there are changes to the model that need pushed to the database, run "add-migration <migration name>".  This will create a migration file like those in the inheritance structure section.  Finally run "update-database" to commit the changes to the database.

I'm not going to insult your intelligence by including a dozen screenshots of me working through the wizards to create an edmx file.  Right click on the project, Add Item, Data, ADO.NET entity data model.  Easy peasy.  What you get will look something like this:


All the examples for the inheritance model was done using code first.  You just define a class that inherits from DbContext, and then run the migrations.



Implement POCOs


POCO stands for Plain Old CLR/C#/Common Object (depending on the source).  The purpose of a POCO is to represent the business objects that are being persisted with EF.  I've always associated POCOs with Code First, but it is possible to write your own even when using an edmx.  In order to do this, you have to disable code generation, which is done by changing a property setting on the edmx file itself in Visual Studio:

This says "EntityModelCodeGeneration" by default.  Kill it!

Unlike DTOs, a POCO can have methods.  POCO's should be persistence ignorant (which is part of why I personally prefer fluent API to annotations).  Adding a list of related items will create  relationship between our POCO and the object type in our list.   If two POCO's have lists of each other, then EF will create an intermediate table (see example here).  Marking the property for a related entity POCO as virtual will enable lazy loading.  Finally, in some disconnected scenarios (like CRUD for a website), it can make life easier to include an explicit foreign key field for a related item, instead of just the navigation property (Julie Lerman goes into all this in depth in her EF courses).

The exam ref makes a bit of a todo about the different between ObjectContext and DbContext, but the differences are pretty negligible from a syntactic standpoint.  Basically, you use ObjectSets instead of DbSets, and you have to explicitly instantiate these in the ObjectContext constructor, which can be done with new ObjectSet<type>(), or this.CreateObjectSet<type>().



Languages to Describe the Data Model


The Entity Framework Data Model is described in the edmx file by three XML based languages.  These languages are CSDL, SSDL, and MSL.  CSDL (Conceptual schema definition language) describes the application entities, the SSDL (Storage schema definition language) describes the database schema, and the MSL (Mapping specification language) describes how the objects map to the tables and columns in the database (basically tying together the CSDL and SSDL).

Below is a semi-pruned exerpt from the NinjaObjectContext.edmx file from my "OldStyleContext" project in the Ninja sample.  I think this makes the purpose of each language clear.  The StorageModels element in the edmx holds the SSDL content, describing the shape of the data in the database. Notice the "datetime" and "nvarchar(max)" Types on the elements.  The ConceptualModels element holds the CSDL markup describing the application entities, with Types familiar from C#.  The Mappings element ties these two together.

<edmx:Runtime>
<!-- SSDL content -->
<edmx:StorageModels>
  <Schema Namespace="NinjaDomain.DataModel.NinjaContextModel.Store" 
          Provider="System.Data.SqlClient" 
          ProviderManifestToken="2008" Alias="Self" 
          xmlns:store="http://schemas.microsoft.com/ado/2007/12/edm/EntityStoreSchemaGenerator" 
          xmlns:customannotation="http://schemas.microsoft.com/ado/2013/11/edm/customannotation" 
          xmlns="http://schemas.microsoft.com/ado/2009/11/edm/ssdl">
    <EntityType Name="Clans">
      <Key>
        <PropertyRef Name="Id" />
      </Key>
      <Property Name="Id" Type="int" StoreGeneratedPattern="Identity" Nullable="false" />
      <Property Name="ClanName" Type="nvarchar(max)" />
      <Property Name="DateModified" Type="datetime" Nullable="false" />
      <Property Name="DateCreated" Type="datetime" Nullable="false" />
    </EntityType><!--
    <EntityType Name="NinjaEquipments">...</EntityType>
    <EntityType Name="Ninjas">...</EntityType>
    <Association Name="FK_dbo_NinjaEquipments_dbo_Ninjas_Ninja_Id">...</Association>
    <Association Name="FK_dbo_Ninjas_dbo_Clans_ClanId">...</Association>
    <EntityContainer Name="NinjaDomainData">...</EntityContainer>-->
  </Schema>
</edmx:StorageModels>
<!-- CSDL content -->
<edmx:ConceptualModels>
  <Schema Namespace="NinjaDomain.DataModel.NinjaContextModel" Alias="Self" 
          annotation:UseStrongSpatialTypes="false" 
          xmlns:annotation="http://schemas.microsoft.com/ado/2009/02/edm/annotation" 
          xmlns:customannotation="http://schemas.microsoft.com/ado/2013/11/edm/customannotation" 
          xmlns="http://schemas.microsoft.com/ado/2009/11/edm">
    <EntityType Name="Clan">
      <Key>
        <PropertyRef Name="Id" />
      </Key>
      <Property Name="Id" Type="Int32" Nullable="false" 
                annotation:StoreGeneratedPattern="Identity" />
      <Property Name="ClanName" Type="String" MaxLength="Max" FixedLength="false" 
         Unicode="true" />
      <Property Name="DateModified" Type="DateTime" Nullable="false" Precision="3" />
      <Property Name="DateCreated" Type="DateTime" Nullable="false" Precision="3" />
      <NavigationProperty Name="Ninjas" Relationship="Self.FK_dbo_Ninjas_dbo_Clans_ClanId" 
                          FromRole="Clans" ToRole="Ninjas" />
    </EntityType><!--
    <EntityType Name="NinjaEquipment">...</EntityType>
    <EntityType Name="Ninja">...</EntityType>
    <Association Name="FK_dbo_Ninjas_dbo_Clans_ClanId">...</Association>
    <Association Name="FK_dbo_NinjaEquipments_dbo_Ninjas_Ninja_Id">...</Association>
    <EntityContainer Name="NinjaObjectContext" >...</EntityContainer>-->
  </Schema>
</edmx:ConceptualModels>
<!-- C-S mapping (MSL) content -->
<edmx:Mappings>
  <Mapping Space="C-S" xmlns="http://schemas.microsoft.com/ado/2009/11/mapping/cs">
    <EntityContainerMapping StorageEntityContainer="NinjaDomainDataContainer" 
                            CdmEntityContainer="NinjaObjectContext">
      <EntitySetMapping Name="Clans">
        <EntityTypeMapping TypeName="NinjaDomain.DataModel.NinjaContextModel.Clan">
          <MappingFragment StoreEntitySet="Clans">
            <ScalarProperty Name="Id" ColumnName="Id" />
            <ScalarProperty Name="ClanName" ColumnName="ClanName" />
            <ScalarProperty Name="DateModified" ColumnName="DateModified" />
            <ScalarProperty Name="DateCreated" ColumnName="DateCreated" />
          </MappingFragment>
        </EntityTypeMapping><!--
      <EntitySetMapping Name="NinjaEquipments">...</EntitySetMapping>
      <EntitySetMapping Name="Ninjas">...</EntitySetMapping>-->
    </EntityContainerMapping>
  </Mapping>
</edmx:Mappings>
</edmx:Runtime>



Custom Code First Conventions


When building the Entity Data Model, Code First uses a number of built in conventions, which are defined as a bunch of classes representing the rules. Some of the broad categories of conventions include:

  • Type Discovery - pulls in the types defined in the DbSets on your context, we well as referenced types.
  • Primary Key Convention - property named "Id" or "(class name)Id".
  • Relationship Convention -  navigation property and foreign key properties.
  • Complex Types Convention - when a primary key cannot be inferred for a type, and it isn't defined in a data annotation or the fluent API, the type is registered as a complex type.
  • ConnectionString Convention - several possible ways for EF to attempt to connect to a database depending on what is configured.


The most direct way of customizing the conventions used in a given entity model is through the fluent API.  The custom conventions MSDN article includes examples of using the DbModelBuilder to change how primary key fields are discovered (based on "Key" instead of "Id"), changing a database datatype from "datetime" to "datetime2", defining and applying an "IsUnicode" attribute (which affects whether varchar or nvarchar is used as a type), and customizing the table naming convention to snake_case instead of UpperCamelCase (aka PascalCase).

The DbModelBuilder class has a Conventions collection property, from which existing conventions may be removed, or to which custom conventions may be added.  Note that while the API for "Conventions" sort of looks collectionish, it is not iterable (I tried).  There are examples of interacting with it in both the custom conventions article and the "model-based conventions" article, and what differentiates these interactions is subtle.  Basically "model-based conventions" change aspects of the model that you don't have access to through the DbModelBuilder class directly.

This was confusing for me at first, because I sort of assumed that it was all there.  They give an example of changing the "Discriminator" property on a Type Per Hierarchy model, and I thought, "So what"?  Can't you just change column names with annotations and such?  What I didn't appreciate is that the Discriminator column doesn't show up on any of the conceptual models.  If you look at my TPH example above, you'll see that the database table includes this exact column, but it doesn't show up anywhere in any of my models.  So if I wanted that column named something else, I have to change the model convention.  The other example in the MSDN article had to do with Independent Associations naming conventions.

My first attempt at an example of a model based customization was actually just something I could have done with DbModelBuilder:

    class CustomNinjaContext : NinjaContext
    {
        protected override void OnModelCreating(DbModelBuilder modelBuilder)
        {
            base.OnModelCreating(modelBuilder);

            modelBuilder.Conventions.Add(new EquipmentRenamingConvention());
            modelBuilder.Conventions.AddBefore<PluralizingTableNameConvention>
                    (new PluralZTableNameConvention());
            modelBuilder.Conventions.Remove<PluralizingTableNameConvention>();
        }
    }

    class EquipmentRenamingConvention : IStoreModelConvention<EdmProperty>
    {
        public void Apply(EdmProperty property, DbModel model)
        {
            if (property.Name == "ServedInOniwaban")
            {
                property.Name = "IsBadass";
            }
        }
    }

    class PluralZTableNameConvention : PluralizingTableNameConvention
    {
        public override void Apply(EntityType item, DbModel model)
        {
            var entitySet = model.StoreModel.Container.GetEntitySetByName(item.Name, true);

            entitySet.Table = entitySet.Table + "z";
        }
    }


It (mostly) did what I expected.  The "ServedInOniwaban" column name changed, and the "pluralization" is now just the table name with two "z"s appended (not sure why they got doubled).  But these changes could just as easily (arguably more easily) be accomplished using the DbModelBuilder API:


    class CustomNinjaContext : NinjaContext
    {
        protected override void OnModelCreating(DbModelBuilder modelBuilder)
        {
            base.OnModelCreating(modelBuilder);

            modelBuilder.Properties().Where(p => p.Name == "ServedInOniwaban")
                .Configure(p => p.HasColumnName("IsBadass"));

            modelBuilder.Types().Configure(t => t.ToTable(t.ClrType.Name + "z"));
        }
    }


This actually works better, since now my table names actually only have one "z" as intended:



The MSDN article for model-based conventions notes that they are an advanced technique, and for once this is not hubris.  This approach really gives you a low-level way of customizing the way EF builds the data model, and most of the time using DbModelBuilder is probably going to be sufficient.  But at least it's an option.  If you really want to get deep in the weeds, you can check out the source files for the built in conventions on Github.  This was helpful for me when I was trying to figure out how to implement my buggy, ill-conceived "pluralize" convention.  

If you do need to use a model-based approach, there are a couple ways to attack it.  One is to create a convention that implements one of the interfaces (IConceptualModelConvention and IStoreModelConvention).  Another is to subclass an existing convention (which is handy if you want to tweak it and replace it from the chain using .AddBefore or .AddAfter).  Or there are a number of base classes in the System.Data.Entity.ModelConfiguration.Conventions namespace to start from.





No comments:

Post a Comment