CTICoder

A random spillage of programming (and other) thoughts

Object Inheritance in Entity Framework

Posted by Michael Bray on August 16, 2008

I started playing with the Entity Framework that was officially released a few days ago along with VS2008 SP1 and .NET 3.5 SP1.  After playing with Linq to SQL, there was one particular feature that I was interested in – the multi-table object inheritance capabilities.  Linq to SQL had the ability to provide inherited types, but the underlying data source stored all of the different types in the same table, so-called Table-Per-Hierarchy (TPH) storage.  In this mechanism, a field in the database is used to distinguish the type that the row represents and it is generated accordingly.  This would lead to artifacts such as a lot of null fields in the database – not such a bad thing as NULL types don’t take up much (if any) space but the table structure is very flat, and not so accommodating if your object hierarchy is more than a few levels deep.

Linq to Entities provides a new type of storage, Table-Per-Type (TPT) that provides a much higher fidelity mapping to the underlying tables as compared to the business classes that are being worked with.  In this mode, there is a separate table for each level of the class hierarchy.  For deep class hierarchies this seems to make more sense.  The down side is that at the very root of the class hierarchy you end up with a table that is gi-normous.

One of the things that I plan to do to mitigate the size of that base table is to link the objects with Guids (uniqueidentifier) instead of ints.  By doing this, I have a root object that has a Guid, and two other properties that I want to be common on all objects – DateCreated and DateModified.  To maximize efficiency, the Guid column is the primary key, and is also the RowGUID.  The derived tables have the same characteristic Guid RowGuid column, and this is the value that I use to link the tables together in a 1-to-1 relationship.  This way I really have a true hierarchy without any additional data storage overhead for the key field. 

Another nice benefit that this provides is that any object in any table can make reference to any other object, and I know that I can find that object.  I also plan to control some of the portions of the Guid, and I’ll use that to identify the type of object (eg Person, Account, Order, etc) so I can know exactly what type of object it is and can load it accordingly.

Here’s an example hierarchy:

image

Note that DbObject is the root of the hierarchy, and assuming that I maintain this structure for all objects in the database, it will grow VERY large.  Is this a problem?   Possibly, but I expect SQL can deal with it.  The main concern is that any time I load one of these objects using Linq to Entities, the SQL query will include a JOIN to the DbObject table, which could be a killer, but possibly not as bad as you might think.  Initial testing indicates that I can load the entire Cisco parts database (which is more than 330,000 items and includes about 15 properties per item) in about 23 seconds.  Not great if the user is waiting, but how often am I going to load the entire database?  Not too often, I hope.  But it isn’t too bad either…  I suspect the use of the ObjectId as the joining field (and the fact that it is the primary key and RowGuid) are significantly helping that result time.

The other idea that I have to help mitigate that huge JOIN is that I may generate two nearly identical models – one with the DbObject and one without it.  This of course doesn’t affect the underlying database – it just affects how Linq to Entities interacts with it.  In that model, it will never attempt to JOIN the DbObject table.  If I’m only looking at the data (not adding a new object or updating it) then I have no need for the data contained in DbObject.  Of course, it also means that I’ll need to have two different models for every object (eg Person and PersonNRO (NRO = NoRootObject)) but it might be worth the gain. This will of course double the work to maintain the structure when changes are made, but I think that I could use a tool or write one myself that would generate the .edmx file for both objects based on a common definition. That’s a project for a rainy day, though.

Anyway, the one odd thing I’ve found as I’ve started working with this stuff is that in the normal model (with the DbObject), the Entity Container object ONLY provides access to root objects (in this case, a member called DbObjectSet).  So you don’t get a direct reference to any of the derived types at all:

image

Note that there is no ‘PersonObjectSet’, ‘EmployeeObjectSet’, ‘ContactObjectSet’ or ‘AccountObjectSet’.  So how do you load those types of entities?  You have to use a function that is provided on DbObjectSet:

Model1Container c = new Model1Container(); 
var people = from p in c.DbObjectSet.OfType<Person>() 
             select p; 

This will return a list of all the objects in the People table (along with the relevant data from DbObject) as one single class ‘Person’.  Amazing!

Advertisements

2 Responses to “Object Inheritance in Entity Framework”

  1. Fakher Halim said

    Great article!
    Actually I like the idea so much that I plan to quote it in one of my presentations. Do I have your permission?
    Will you email me the database/source code (fakher@msn.com).
    I have written an article on using a single table for entire inheritance at http://www.codeproject.com/KB/architecture/LinqEntityFramework.aspx and would like to put a link to your site on my next presentation/article.

  2. Scott Taylor said

    I’m not clear on how using a guid instead of an int is going to appreciably “mitigate the size” of the base table. Either way, you’re going to have one row per object instance, no matter how deep or shallow the rest of the hierarchy is. Using a guid for the primary key does make sense, however, but for the usual reasons of ensuring uniqueness in various scenarios.

    I would recommend that you stay away from the idea of trying to make a “smart key” out of a guid…facts used for constraints should *always* be stored as separate attributes….the guid key should *only* be used to express relationships between tables (as well as to ensure that individual records can be reliably selected).

    As far as retrieval performance goes, a properly indexed table hierarchy will, if anything, be faster than retrieving records from a single table (as in the Table-Per-Hierarchy scenario). If you are concerned about selecting the entire set of records at once (for example, your database with 330,000 items), I would recommend using some form of the data repository pattern, where you can select just the records of interest from what amounts to a virtual database layer.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: