Object Inheritance in Entity Framework
Posted by Michael Bray on August 16, 2008
I started playing with the Entity Framework that was officially released a few days ago along with VS2008 SP1 and .NET 3.5 SP1. After playing with Linq to SQL, there was one particular feature that I was interested in – the multi-table object inheritance capabilities. Linq to SQL had the ability to provide inherited types, but the underlying data source stored all of the different types in the same table, so-called Table-Per-Hierarchy (TPH) storage. In this mechanism, a field in the database is used to distinguish the type that the row represents and it is generated accordingly. This would lead to artifacts such as a lot of null fields in the database – not such a bad thing as NULL types don’t take up much (if any) space but the table structure is very flat, and not so accommodating if your object hierarchy is more than a few levels deep.
Linq to Entities provides a new type of storage, Table-Per-Type (TPT) that provides a much higher fidelity mapping to the underlying tables as compared to the business classes that are being worked with. In this mode, there is a separate table for each level of the class hierarchy. For deep class hierarchies this seems to make more sense. The down side is that at the very root of the class hierarchy you end up with a table that is gi-normous.
One of the things that I plan to do to mitigate the size of that base table is to link the objects with Guids (uniqueidentifier) instead of ints. By doing this, I have a root object that has a Guid, and two other properties that I want to be common on all objects – DateCreated and DateModified. To maximize efficiency, the Guid column is the primary key, and is also the RowGUID. The derived tables have the same characteristic Guid RowGuid column, and this is the value that I use to link the tables together in a 1-to-1 relationship. This way I really have a true hierarchy without any additional data storage overhead for the key field.
Another nice benefit that this provides is that any object in any table can make reference to any other object, and I know that I can find that object. I also plan to control some of the portions of the Guid, and I’ll use that to identify the type of object (eg Person, Account, Order, etc) so I can know exactly what type of object it is and can load it accordingly.
Here’s an example hierarchy:
Note that DbObject is the root of the hierarchy, and assuming that I maintain this structure for all objects in the database, it will grow VERY large. Is this a problem? Possibly, but I expect SQL can deal with it. The main concern is that any time I load one of these objects using Linq to Entities, the SQL query will include a JOIN to the DbObject table, which could be a killer, but possibly not as bad as you might think. Initial testing indicates that I can load the entire Cisco parts database (which is more than 330,000 items and includes about 15 properties per item) in about 23 seconds. Not great if the user is waiting, but how often am I going to load the entire database? Not too often, I hope. But it isn’t too bad either… I suspect the use of the ObjectId as the joining field (and the fact that it is the primary key and RowGuid) are significantly helping that result time.
The other idea that I have to help mitigate that huge JOIN is that I may generate two nearly identical models – one with the DbObject and one without it. This of course doesn’t affect the underlying database – it just affects how Linq to Entities interacts with it. In that model, it will never attempt to JOIN the DbObject table. If I’m only looking at the data (not adding a new object or updating it) then I have no need for the data contained in DbObject. Of course, it also means that I’ll need to have two different models for every object (eg Person and PersonNRO (NRO = NoRootObject)) but it might be worth the gain. This will of course double the work to maintain the structure when changes are made, but I think that I could use a tool or write one myself that would generate the .edmx file for both objects based on a common definition. That’s a project for a rainy day, though.
Anyway, the one odd thing I’ve found as I’ve started working with this stuff is that in the normal model (with the DbObject), the Entity Container object ONLY provides access to root objects (in this case, a member called DbObjectSet). So you don’t get a direct reference to any of the derived types at all:
Note that there is no ‘PersonObjectSet’, ‘EmployeeObjectSet’, ‘ContactObjectSet’ or ‘AccountObjectSet’. So how do you load those types of entities? You have to use a function that is provided on DbObjectSet:
Model1Container c = new Model1Container(); var people = from p in c.DbObjectSet.OfType<Person>() select p;
This will return a list of all the objects in the People table (along with the relevant data from DbObject) as one single class ‘Person’. Amazing!