CTICoder

A random spillage of programming (and other) thoughts

Posts Tagged ‘LINQ’

A Generic run-time LINQ-based multi-level object sorter

Posted by Michael Bray on October 26, 2009

Assume you have a list of objects that has a set of properties.  These properties are stored in a StringCollection or other similar lookup, and you want to sort the objects based on some of these properties, but you don’t know at compile-time which properties to sort on or in what order (that information will be supplied at run-time, perhaps in configuration).  How do you sort this list, in a manner that honors ascending / descending as well as multi-level sorting rules?  You can’t simply sort the list by each property, since each time you sort, it will wipe out the previous sorting operation.  Of course, LINQ provides sorting thru OrderBy(…) and ThenBy(…) functions that handle the multi-level sort issue.  But it’s a bit more complicated than that, since you don’t know the properties you want to sort on.

Here, I demonstrate a relatively simple generic object sorter that correctly handles multi-level sorting and ascending/descending at each level.

private IEnumerable<T> MultiLevelSort<T, SK>(IEnumerable<T> list, List<SK> sortKeys, Func<T, SK, string> keySelector, Func<SK, bool> ascendingSelector)
{
    if (sortKeys.Count == 0) return list;

    IOrderedEnumerable<T> res = null;
    for (int i = 0; i < sortKeys.Count; i++)
    {
        SK sk = sortKeys[i];
        bool ascending = ascendingSelector(sk);
        if (i == 0)
        {
            if (ascending) res = list.OrderBy(r => keySelector(r, sk));
            else res = list.OrderByDescending(r => keySelector(r, sk));
        }
        else
        {
            if (ascending) res = res.ThenBy(r => keySelector(r, sk));
            else res = res.ThenByDescending(r => keySelector(r, sk));
        }
    }
    return res;
}

This function takes 4 parameters:

  1. An IEnumerable<T> of objects to sort
  2. A List<SK> of objects that contain sorting order information (note that this list itself is expected to already be in the correct sort order)
  3. A Func<T, SK, string> to extract the value from T based on information in SK to actually sort on
  4. A Func<SK, bool> to extract the ascending/descending information from SK

…and it returns the list correctly sorted as an IEnumerable<T>.  Note that the actual object returned is actually an IOrderedEnumerable<T> as long as there is at least one valid sort key.

This code could then be used as such:

List<MyProperty> sortProps = AllProperties.Where(sp => sp.Sort != string.Empty).OrderBy(sp => sp.SortOrder).ToList();
IEnumerable<MyObject> sortedResults = MultiLevelSort<MyObject, MyProperty>(
    results, sortProps,
    (r, pe) => r.Properties.ContainsKey(pe.Name) ? r.Properties[pe.Name] : string.Empty,
    pe => pe.Sort == "Ascending"
        ).ToList();

Where MyObject is an object that contains a StringCollection called ‘Properties’, and MyProperty is an object that contains properties called ‘Sort’ (“Ascending/Descending”), ‘SortOrder’ (an integer), and ‘Name’ (the name of the property within the MyObject.Properties collection that we want to sort on).

Advertisements

Posted in .NET | Tagged: | Leave a Comment »

Object Inheritance in Entity Framework

Posted by Michael Bray on August 16, 2008

I started playing with the Entity Framework that was officially released a few days ago along with VS2008 SP1 and .NET 3.5 SP1.  After playing with Linq to SQL, there was one particular feature that I was interested in – the multi-table object inheritance capabilities.  Linq to SQL had the ability to provide inherited types, but the underlying data source stored all of the different types in the same table, so-called Table-Per-Hierarchy (TPH) storage.  In this mechanism, a field in the database is used to distinguish the type that the row represents and it is generated accordingly.  This would lead to artifacts such as a lot of null fields in the database – not such a bad thing as NULL types don’t take up much (if any) space but the table structure is very flat, and not so accommodating if your object hierarchy is more than a few levels deep.

Linq to Entities provides a new type of storage, Table-Per-Type (TPT) that provides a much higher fidelity mapping to the underlying tables as compared to the business classes that are being worked with.  In this mode, there is a separate table for each level of the class hierarchy.  For deep class hierarchies this seems to make more sense.  The down side is that at the very root of the class hierarchy you end up with a table that is gi-normous.

One of the things that I plan to do to mitigate the size of that base table is to link the objects with Guids (uniqueidentifier) instead of ints.  By doing this, I have a root object that has a Guid, and two other properties that I want to be common on all objects – DateCreated and DateModified.  To maximize efficiency, the Guid column is the primary key, and is also the RowGUID.  The derived tables have the same characteristic Guid RowGuid column, and this is the value that I use to link the tables together in a 1-to-1 relationship.  This way I really have a true hierarchy without any additional data storage overhead for the key field. 

Another nice benefit that this provides is that any object in any table can make reference to any other object, and I know that I can find that object.  I also plan to control some of the portions of the Guid, and I’ll use that to identify the type of object (eg Person, Account, Order, etc) so I can know exactly what type of object it is and can load it accordingly.

Here’s an example hierarchy:

image

Note that DbObject is the root of the hierarchy, and assuming that I maintain this structure for all objects in the database, it will grow VERY large.  Is this a problem?   Possibly, but I expect SQL can deal with it.  The main concern is that any time I load one of these objects using Linq to Entities, the SQL query will include a JOIN to the DbObject table, which could be a killer, but possibly not as bad as you might think.  Initial testing indicates that I can load the entire Cisco parts database (which is more than 330,000 items and includes about 15 properties per item) in about 23 seconds.  Not great if the user is waiting, but how often am I going to load the entire database?  Not too often, I hope.  But it isn’t too bad either…  I suspect the use of the ObjectId as the joining field (and the fact that it is the primary key and RowGuid) are significantly helping that result time.

The other idea that I have to help mitigate that huge JOIN is that I may generate two nearly identical models – one with the DbObject and one without it.  This of course doesn’t affect the underlying database – it just affects how Linq to Entities interacts with it.  In that model, it will never attempt to JOIN the DbObject table.  If I’m only looking at the data (not adding a new object or updating it) then I have no need for the data contained in DbObject.  Of course, it also means that I’ll need to have two different models for every object (eg Person and PersonNRO (NRO = NoRootObject)) but it might be worth the gain. This will of course double the work to maintain the structure when changes are made, but I think that I could use a tool or write one myself that would generate the .edmx file for both objects based on a common definition. That’s a project for a rainy day, though.

Anyway, the one odd thing I’ve found as I’ve started working with this stuff is that in the normal model (with the DbObject), the Entity Container object ONLY provides access to root objects (in this case, a member called DbObjectSet).  So you don’t get a direct reference to any of the derived types at all:

image

Note that there is no ‘PersonObjectSet’, ‘EmployeeObjectSet’, ‘ContactObjectSet’ or ‘AccountObjectSet’.  So how do you load those types of entities?  You have to use a function that is provided on DbObjectSet:

Model1Container c = new Model1Container(); 
var people = from p in c.DbObjectSet.OfType<Person>() 
             select p; 

This will return a list of all the objects in the People table (along with the relevant data from DbObject) as one single class ‘Person’.  Amazing!

Posted in .NET | Tagged: , | 2 Comments »