A random spillage of programming (and other) thoughts

Debugging into Compiled Assemblies, tracking down an apparent bug in Microsoft SQL SMO

Posted by Michael Bray on February 6, 2015

For my day job, I had written an application which would capture all changes made to our development SQL database, script those changes using Microsoft SQL SMO, and automatically check in/out the files as needed so that changes to our database could be tracked efficiently and cleanly in TFS. 

But I had a problem – Microsoft SQL SMO was failing to script a trigger.  Specifically, it was generating the following error:

Screenshot - 2_6_2015 , 1_06_46 PM

Microsoft.SqlServer.Management.Smo.FailedOperationException: Script failed for Trigger ‘as_invoice_insertToDocuments’.  —> Microsoft.SqlServer.Management.Smo.FailedOperationException: Syntax error in TextHeader of Trigger ‘as_invoice_insertToDocuments’.

When I first googled this error, I found a couple of references and suggestions…   Most of them said that this could be caused by nested comments, but my trigger had no comments at all.  I tried changing the contents and header of the trigger, but nothing worked.  When I tried debugging my program, I was stifled because I was unable to debug into the SMO DLL – all I could tell is that it was generating the exception.

My first attempt at diagnosing the error was to try to use dotPeek to disassemble the Microsoft.SqlServer.Management.Smo.DLL library to figure out why the code was throwing an error.  This is what I found:


This narrowed it down a bit, but still, there were two possible lines that could have been generating the exception, with no way to distinguish between them (grrr!). 

And that’s when I decided to try to debug into the SMO library.  I reasoned that since I knew it was possible to debug into the .NET Framework, I might be able to do the same with this library, and I was right.  Doing it was a bit tricky, though…   First, I decided to use dotPeek’s symbol server.  It’s possible that I could have used Microsoft’s symbol server too, since it was a Microsoft DLL, but I haven’t tried.  So I started up dotPeek, turned on the symbol server, and selected the option to generate PDBs for all assemblies (just to be safe).  Then I went back to Visual Studio (I have 2010), configured the symbol server with the http address provided by dotPeek, turned off the option “Enable Just My Code”, and tried debugging my program by F11 stepping into the SMO function call that was failing.  On my first try, it didn’t work. 

After a bit of research, I went back into dotPeek and opened the “Project/PDB Generation Log” window and saw that it was apparently still generating some PDBs and/or source files.  So I waited for that to finish, then went back to Visual Studio.  Before attempting to debug again, I tried a few more changes to Visual Studio options – specifically I checked the box “Enable .NET Framework source stepping” which seemed to do something – Visual Studio seemed to load and cache the PDB/source files.  I’m not sure if that was a necessary step or coincidental to the fact that dotPeek was now finished generating it’s PDB files, but it worked – now when I F11 stepped into the SMO function call, I was viewing the SMO library source code.  WIN #1!

As I started stepping thru the code to try to get to the line that was causing the Exception to be thrown, however, I realized it was going to be difficult to get what I needed.  When I got to the CheckTextCorrectness method indicated by the stack trace, I found that it was in fact the FIRST throw that was executing, in response to the function call to CheckDdlHeader apparently returning false!  I tried debugging into that function to figure out what going on, but I kept having difficulty because when I tried to analyze variables, the Visual Studio Watch window would just say “Cannot obtain value…as it is not available at this instruction pointer, possibly because it has been optimized away”, even though I was running in DEBUG mode and had “Optimize Code” unchecked in my project file settings.  Actually that made sense since those settings would only affect code that was compiled on my machine, which the SMO dll was not.  It was actually the .NET JIT compiler that was optimizing the MSIL.  But I really needed to see those variables to figure out what was going on.

Then I found a web page with the information to avoid the optimizations.  WIN #2!   Just in case that page ever goes away, here’s the answer, lifted almost verbatim from that page (sorry!):

Ok, you think you’re cool when you get the capability to debug the .NET Framework source code all set up. You’re like, “I am all powerful!” Then you start noticing the oddities.

“Wait, why I can’t get the value of that variable?!”

“Why did it step there? It should have stepped here?!”

The problem is that ‘you’re debugging against retail-optimized code’. Fortunately, someone at Microsoft handed out the trick to disable these optimizations. Check out this link [EDIT: this link appears to be broken, the correct URL might be this one!] for more info, but basically there are only a few steps:

  • Create a .cmd file that sets an environment variable and then launches Visual Studio. Name it whatever you want (e.g. DisableOptimizationsInVisualStudio.cmd). It’s contents should be:
set COMPLUS_ZapDisable=1
cd /d "%ProgramFiles%\Microsoft Visual Studio 9.0\Common7\ide\"
start devenv.exe
  • Launch Visual Studio with this .cmd file.
  • Once in Visual Studio, disable the Visual Studio hosting process:

Right click on your project and choose “Properties”.

Choose the “Debug” tab and uncheck “Enable the Visual Studio Hosting Process”.

  • Launch your application in the debugger.


NOTE: This page refers to “Microsoft Visual Studio 9.0” but your exe path might be different – for Visual Studio 2010 it’s “Microsoft Visual Studio 10.0”.

Once I did the steps above, it was like manna from heaven – I could now see the contents of all the variable that I couldn’t before, and I was finally able to diagnose (and work around) the problem in SMO.  Since doing this, I’ve found several other references to things I could have checked, such as the Advanced Build settings, but I’m not sure if they would have worked for an externally-linked DLL being debugged thru the dotPeek symbol server.  I’ve also found a method using an INI file [Microsoft reference] that controls the JIT optimization on a per-dll basis, although I’m not sure if that works independently every time the executable runs.  For me, setting the environment variable was easier.

In case you are wondering, the workaround for my actual problem was simple.  All I had to do was turn off the option “DdlBodyOnly” in the ScriptingOptions parameter passed to the Script() method.  When this option is set to true, CheckTextCorrectness attempts to ensure that the DDL text starts with CREATE or ALTER, but I’m not sure why either (a) they would do this when DdlBodyOnly is set to true since it should only check the body and not the header, or (b) why it doesn’t fail for the other object types, or (c) why it doesn’t just generate the body without the header, which would have prevented me from ever setting it to true in the first place.  I would consider this a bug one way or another in SMO.  The result of setting this parameter to false is that the trigger script passes validation, but it also includes “SET ANSI_NULLS” and “SET QUOTEDIDENTIFIERS” lines at the top of the script, which is what I was generally trying to avoid by setting DdlBodyOnly to true.


Posted in .NET, .NET Framework, Bugs, Debugging, SMO, SQL, Visual Studio | Tagged: | Leave a Comment »

New Year. New Wallpaper

Posted by Shaun McDonnell on January 4, 2010


Posted in Uncategorized | Leave a Comment »

Linq2Sql Caching with SqlDependency

Posted by Shaun McDonnell on November 25, 2009

It took me a while to figure this one out.  I almost thought it was impossible to implement it in a ‘good way’.  However, extension methods saved the day.  So, if you want to use Sql Service Broker dependencies in your Linq2Sql code, you’ll need an extension method like this:

   1: public static List<T> LinqCache<T>(this IQueryable<T> q, DataContext dc, string key)
   2: {
   3:     SqlDependency.Start(dc.Connection.ConnectionString);
   4:     var cachedItem = (List<T>)HttpContext.Current.Cache.Get(key);
   6:     if (cachedItem == null)
   7:     {
   8:         TracingHelper.Write("No cache key [{0}] found for query: {1}", key, dc.GetCommand(q).CommandText);
   9:         string connectionString = dc.Connection.ConnectionString;
  10:         string command = dc.GetCommand(q).CommandText;
  11:         using (var sql = new SqlConnection(connectionString))
  12:         {
  13:             sql.Open();
  14:             using (var cmd = new SqlCommand(command, sql))
  15:             {
  16:                 foreach (DbParameter dbp in dc.GetCommand(q).Parameters)
  17:                 {
  18:                     cmd.Parameters.Add(new SqlParameter(dbp.ParameterName, dbp.Value));
  19:                 }
  20:                 SqlCacheDependencyAdmin.EnableNotifications(connectionString);
  21:                 string notificationTable = q.ElementType.Name;
  22:                 if (!SqlCacheDependencyAdmin.GetTablesEnabledForNotifications(connectionString).Contains(notificationTable))
  23:                     SqlCacheDependencyAdmin.EnableTableForNotifications(connectionString, notificationTable);
  24:                 var sqldep = new SqlCacheDependency(cmd);
  26:                 cmd.ExecuteNonQuery();
  27:                 cachedItem = q.ToList();
  28:                 HttpContext.Current.Cache.Insert(key, cachedItem, sqldep);
  29:             }
  30:         }
  31:     }
  32:     else
  33:     {
  34:         TracingHelper.Write("Cache key [{0}] FOUND for query:  {1}", key, dc.GetCommand(q).CommandText);
  35:     }
  37:     SqlDependency.Stop(dc.Connection.ConnectionString);
  38:     return cachedItem;
  39: }

Now, in order to use this extension method in your queries, you’ll need to do something like this:

   1: public List<Company> Get()
   2: {
   3:     using (var dataContext = new RoutePointRepositoryDataContext(Db.ConnectionString))
   4:     {
   5:         var items = dataContext.Companies.LinqCache(dataContext, "Companies.All");
   6:         return items;
   7:     }
   8: }

The downside is that I haven’t figured out a good way to implement this when retrieving one record from the database.  So, you’ll have to do this:

   1: public Company Get(int id)
   2: {
   3:     using (var dataContext = new RoutePointRepositoryDataContext(Db.ConnectionString))
   4:     {
   5:         var company = dataContext.Companies.Where(c => c.Id == id).LinqCache(dataContext, "Company.Get." + id.ToString()).Single();
   6:         return company;
   7:     }
   8: }


MSDN did have some code similar to this but it did not work and there were no examples.  This works and and the examples should be enough to get you started.


Posted in .NET, LINQ2SQL, SQL | Leave a Comment »

A Generic run-time LINQ-based multi-level object sorter

Posted by Michael Bray on October 26, 2009

Assume you have a list of objects that has a set of properties.  These properties are stored in a StringCollection or other similar lookup, and you want to sort the objects based on some of these properties, but you don’t know at compile-time which properties to sort on or in what order (that information will be supplied at run-time, perhaps in configuration).  How do you sort this list, in a manner that honors ascending / descending as well as multi-level sorting rules?  You can’t simply sort the list by each property, since each time you sort, it will wipe out the previous sorting operation.  Of course, LINQ provides sorting thru OrderBy(…) and ThenBy(…) functions that handle the multi-level sort issue.  But it’s a bit more complicated than that, since you don’t know the properties you want to sort on.

Here, I demonstrate a relatively simple generic object sorter that correctly handles multi-level sorting and ascending/descending at each level.

private IEnumerable<T> MultiLevelSort<T, SK>(IEnumerable<T> list, List<SK> sortKeys, Func<T, SK, string> keySelector, Func<SK, bool> ascendingSelector)
    if (sortKeys.Count == 0) return list;

    IOrderedEnumerable<T> res = null;
    for (int i = 0; i < sortKeys.Count; i++)
        SK sk = sortKeys[i];
        bool ascending = ascendingSelector(sk);
        if (i == 0)
            if (ascending) res = list.OrderBy(r => keySelector(r, sk));
            else res = list.OrderByDescending(r => keySelector(r, sk));
            if (ascending) res = res.ThenBy(r => keySelector(r, sk));
            else res = res.ThenByDescending(r => keySelector(r, sk));
    return res;

This function takes 4 parameters:

  1. An IEnumerable<T> of objects to sort
  2. A List<SK> of objects that contain sorting order information (note that this list itself is expected to already be in the correct sort order)
  3. A Func<T, SK, string> to extract the value from T based on information in SK to actually sort on
  4. A Func<SK, bool> to extract the ascending/descending information from SK

…and it returns the list correctly sorted as an IEnumerable<T>.  Note that the actual object returned is actually an IOrderedEnumerable<T> as long as there is at least one valid sort key.

This code could then be used as such:

List<MyProperty> sortProps = AllProperties.Where(sp => sp.Sort != string.Empty).OrderBy(sp => sp.SortOrder).ToList();
IEnumerable<MyObject> sortedResults = MultiLevelSort<MyObject, MyProperty>(
    results, sortProps,
    (r, pe) => r.Properties.ContainsKey(pe.Name) ? r.Properties[pe.Name] : string.Empty,
    pe => pe.Sort == "Ascending"

Where MyObject is an object that contains a StringCollection called ‘Properties’, and MyProperty is an object that contains properties called ‘Sort’ (“Ascending/Descending”), ‘SortOrder’ (an integer), and ‘Name’ (the name of the property within the MyObject.Properties collection that we want to sort on).

Posted in .NET | Tagged: | Leave a Comment »

Simulating VS.net’s MSI InstallURL property with WiX

Posted by Michael Bray on October 24, 2009

I recently converted several installers from VS.net to WiX.  In one of those installers, I was using a Registry Search condition to check to see if MSXML6 was installed, since the application requires it.  If it wasn’t installed, I was using Visual Studio’s InstallURL property to redirect the user to the Microsoft download page for the package so they could download and install it.

WiX doesn’t appear to have an InstallURL property available by default, but you can simulate it with some custom actions.   Along the way I learned quite a bit about how WiX structures CustomActions, and experienced quite a bit of frustration getting it to work.  The first step to simulating the InstallURL capability is to set up a Property that searches the registry for the MSXML key:

<Property Id="MSXML6">
    <RegistrySearch Id="MSXML6Search" Root="HKCR" Key="Msxml2.DOMDocument.6.0" Type="raw" /

This code is very standard code for polling a registry value – no surprises here.  The next step is to build in two custom actions that both tie to this property:

<Property Id="cmd" Value="cmd.exe" />
<CustomAction Id="OpenMSXML6Download" Property="cmd"
    ExeCommand="/c start http://www.microsoft.com/downloads/details.aspx?FamilyID=993c0bcf-3bcf-4009-be21-27e85e1857b1"
    Return="check" />
<CustomAction Id="OpenMSXML6DownloadError" Error="MSXML6 must be installed first." />

The first Custom Action executes a command window, and starts the URL in the ExeCommand.  The important and confusing thing to note here is that the actual command to execute is put in a property, and any parameters are put in “ExeCommand” which is very poorly named.  The parameters in this case are a trick to start up the default browser to the desired URL.  The minor drawback to this is that you see the command window briefly.  I think there is a better way to do this that I’ve seen but not yet tried that involves doing a registry search to locate the default browser executable and then calling it directly.

The second custom action simply opens an Error dialog and exits the installation.

The third piece to this puzzle is to insert the Custom Action into the InstallExecuteSequence:

    <!-- Takes user to MSXML6 download page to be installed -->
    <Custom Action="OpenMSXML6Download" After="AppSearch">NOT MSXML6 AND Not Installed</Custom>
    <Custom Action="OpenMSXML6DownloadError" After="OpenMSXML6Download">NOT MSXML6 AND Not Installed</Custom>

One critical piece of information here is to note the “After=AppSearch”.  When I was first trying to get this to work, I dealt with an huge amount of frustration because I had set this to “After=FindRelatedProducts”.  I had chosen this because although I don’t show it above, this particular WiX install also enables the app to install over itself and prevents downgrading, and in that scenario, that’s the After value that you use.  My assumption that it would work to simulate the InstallURL was a very bad one.  The problem is that FindRelatedProducts occurs before AppSearch, which is where the RegistrySearch property is evaluated.  As a result, the MSXML6 property was NEVER defined and I was redirecting to the download page even if MSXML6 was installed.  (BTW this is a good reason to download and install the Windows SDK tools – I only discovered this because I bothered to open the MSI in Orca!)

With these three pieces in place, the installer now correctly detects MSXML6 and will redirect the user to the download page (and terminate) if it isn’t installed.  Note, however, that as presented above, the user may have to go thru the majority of the UI install before this happens.  If you don’t want the user to see any of the UI before the check and redirect takes place, duplicate the InstallExecuteSequence lines into the InstallUISequence section of the WXS file.  I’ve been told it’s somewhat of a bad practice, but that does fall more in line with the way the Visual Studio InstallURL works.

Posted in WiX | Tagged: | 1 Comment »

My wife always said I can’t see 4 feet in front of me…

Posted by Michael Bray on October 20, 2009

At home, I am NOTORIOUS for looking right at things and not seeing them.  This type of conversation has been heard many times thru the years: “Honey, could you pass the sugar?”  “Hmm?  Where is it?”  “Right in front of you, doofus.”

Well this isn’t just at home – it’s here at work too.  And I just found something that (I presume) has been there all the time and I just never saw it.  And I love it.

In SQL Server Management Studio, when you execute a query and display the execution plan, that plan is often (for me anyway) way larger than the screen.  Sure you can right-click and there are various zoom options like ‘Zoom to fit’.  But if you do that, then you are left needing jewelers glasses to see it, and worse, with having to navigate around the plan, which can be a pain.  Or *was* a pain. 

If you look closely (you might have to squint) you’ll see a + below the scroll bar:


If you click that +, you get a Panning window that lets you move anywhere in the query plan you want, pain free.  And no need to zoom!

Posted in Uncategorized | Leave a Comment »

Enabling Automatic NTLM Authentication in Firefox

Posted by Michael Bray on October 2, 2009

When browsing using Internet Explorer to a site on an internal network that is configured with Integrated Windows Authentication, such as http://intranet, IE will automatically attempt to use your domain credentials to log in.  It’s always pained me that Firefox didn’t do the same thing.  I thought it had something to do with some M$ hocus pocus undocumented feature magic that prevented FF from being able to do this.  SO wrong.  It’s easy to enable Firefox to do the same:

To enable this feature in Firefox, go to the “about:config” url in Firefox, and then find the configuration parameters for “network.automatic-ntlm-auth.trusted-uris”, “network.negotiate-auth.delegation-uris”, and “network.negotiate-auth.trusted-uris”.  In those values, put the URIs that you want Firefox to automatically pass authentication to.  You can either use the http:// prefix or you can leave it off, and you can specify multiple parameters by separating them with a comma.

Posted in Uncategorized | Leave a Comment »

Reconfiguring BTARN manually.

Posted by Shaun McDonnell on August 26, 2009

I ended up changing my network domain name infrastructure a little bit and had to modify my host-headers in IIS so that everything would work well together (BizTalk, BTARN, Sharepoint, etc).  After I did this, I kept receiving the following error anytime I sent a message to BTARN:

RNIFReceiverWebApplication:  (404) Not Found

Obviously that means that the RNIFReceive.aspx was not passing the data off to the correct URL for the BTSHTTPReceive.dll.  I looked everywhere to reconfigure this without having to uninstall the WebApps aspect of BTARN and I couldn’t find it.  So, of course I looked in the registry and sure enough it was there:


This is located under HKEY_LOCAL_MACHINE/Software/Wow6432Node/Microsoft/BizTalk Accelerator for RosettaNet/2009 (this is for the 64-bit version of the accelerator. 

The keys you need to change are fairly obvious from here and you will need to perform an IIS reset afterwards to make sure the changes are propagated through the pipeline.



Posted in Uncategorized | Leave a Comment »

BizTalk 2009 and BTARN 3.5 on Windows Server 2008 Installation

Posted by Shaun McDonnell on August 20, 2009

I had some interesting problems occur when trying to install BizTalk 2009 and BTARN (BizTalk Accelerator for Rosetta Net) on Windows Server 2008.  At the end of the day the problems were trivial but kept halting my installation with errors I could not figure out.

The first error occurred when I ran the BTARN configuration and looked something like:


which reads:

“A BizTalk Isolated Host instance configured with the user account ‘BizTalkMumbles’ was either not running or does not exist on this computer.  Use the BizTalk Administration Console to create a new Isolated host or to reconfigure an existing host to run as ‘BizTalkMumbles’”


Well, I had done all of that already so this obviously wasn’t the problem.  It turned out I needed to use <COMPUTERNAME>\BizTalkMumbles or <DOMAINNAME>\BizTalkMumbles when I configured my BizTalk Host Application Instances.  Once I did this the error went away. 

Here is what the configuration looked like when I was getting the error:


And here is the configuration that made the error go away:


Note the change in the Logon.

I consider this to be Microsoft’s issue as the underlying logon user never changed but adding the machine or domain name helped.

Then, when I proceeded into my BTARN configuration, installing the Runtime and the WebApp failed miserably with errors in the log file that looked like this:

12:50:33 PM Info ConfigHelper]     Detected IIS version: 7
[12:50:33 PM Error ConfigHelper] d:\depot2300\mercury\private\common\configwizard\confighelper\iis.cpp(594): FAILED hr = 80040154

[12:50:33 PM Error ConfigHelper]     Failed to open application pool Error: d!
[12:50:33 PM Error ConfigHelper] Class not registered
[12:50:33 PM Error ConfigHelper] d:\depot2300\mercury\private\common\configwizard\confighelper\iis.cpp(151): FAILED hr = 80040154

[12:50:33 PM Info ConfigHelper]     Error -2147221164 occurred while attempting to retrieve site ID for site
[12:50:33 PM Error ConfigHelper] Class not registered
[12:50:33 PM Error RNConfig] c:\depot\europa2009\private\europa_source\btarnconfig\dll\rnconfig.cpp(2211): FAILED hr = 80040154

[12:50:33 PM Info RNConfig] Leaving function: CRNConfig::ConfigureFeature
[12:50:33 PM Warning Configuration Framework]Feature failed to configure: WebApps.
[12:50:33 PM Info Configuration Framework]Configuration Summaries:
[12:50:33 PM Error Configuration Framework]Feature: [Runtime] Failed to configure with error message [<Exception Message="Class not registered" Source="ConfigHelper" HelpID="HelpIdGetWebsiteID"/>]
[12:50:33 PM Error Configuration Framework]Feature: [WebApps] Failed to configure with error message [<Exception Message="Class not registered" Source="ConfigHelper" HelpID="HelpIdGetWebsiteID"/>]
[12:50:33 PM Info Configuration Framework]    Feature: Runtime    Configuration Enabled: yes    Sub UI: no    Configured: no
[12:50:33 PM Info Configuration Framework]    Feature: WebApps    Configuration Enabled: yes    Sub UI: no    Configured: no

I figured BTARN was having trouble talking to IIS and maybe needed IIS6 Compatability installed to be able to make the connection.  That turned out to be the case:


Make sure you have the ‘IIS 6 Management Compatability’ Role Service installed for IIS 7.

I searched and searched the net for answers on this and never found anything so I figured I would post my findings here.


Posted in Uncategorized | 5 Comments »

A mathematical diversion with LINQ

Posted by Michael Bray on July 18, 2009

Every now and then I’ll come across a math problem as part of my work…   It’s not what I do it’s just something that happens.  When it does happen, I either become obsessed with it until I solve it (or at least THINK I solve it, one way or another) or I eventually give up with the blunt realization that I just spent hours thinking about something that is likely to cause psychosis in old age.  Fortunately, the problem I ran across today seems to have fallen into the first category… 

Here’s the question, simplified…   Imagine you have a collection of at least 10 items, of which you are going to choose 5 at random.  One of those 5 is then taken away and replaced with a different NEW item (pulled from some other pile of items).  Next you choose another 5, again randomly.  The questions I asked myself in this scenario were:

  1. What are the chances that you WON’T see any of the same 5 that you originally picked? 
  2. What if the collection is 20 items instead of 10 items? 
  3. How many items have to be in the larger collection so that you have at least a 90% chance of NOT seeing any one of the originally chosen items if you are choosing 5 at a time?
The Math…

This actually is a fairly simple problem to solve…  but I ran thru an interesting programming journey on the way to the answers, and it was the programming aspect that prompted this post.  To reiterate the scenario, the collection starts with 10 items of which 5 are chosen and one is taken away; then another NEW item is added, again giving you 10 items from which to choose 5 of them, and out of the 10 from which you choose, 4 of them were ones you saw before. 

When working problems of this kind (because I’m not so good at probabilities) I prefer to back up to a simpler case.  So let’s consider what happens when you have 6 items in the collection, and you choose 3 at a time.  This means that on the second draw, there will be 2 that must be avoided.  If the items in the second draw are A, B, C, D, E, and F, and we presume that E and F must be avoided, then we have the following:

  • “Good” choices: ABC, ABD, ACD, BCD

So there are 20 possible choices, of which only 4 yield a good result.  Each of these numbers is a binomial coefficient.  The number 20 is the total number of 3-letter combinations, or 6C3. The number 4 is the number of 3-letter combinations of only the first 4 “good” letters, or 4C3. So it should be clear that in general, the number of total choices is NCr where N is the number of items in the collection, r is the number of items we are picking, and the number of “good” choices is MCr where M = Nr + 1.

So now the answer to the first question becomes easy…  If you only have 10 items to choose from at a time, this means that 5 were chosen from the original 10, and then one of those 5 goes away and will be replaced by a new (unseen) item, so there are 4 out of 10 that need to be avoided.  Thus in this case, N=10, r=5, and M=6.  And thus, the likelihood that you WON’T see any of the original 5 (but really 4) items is:


Wow… the answer really is 42!!!  (Sort of.)

Ok, so what about the second question…   well the answer is still easy, because it is the same formula…  except now the numbers get bigger…   N=20, r=5, M=16:


Ok so what about the third problem?  Well it’s just a matter of plugging numbers in, but calculating binomial coefficients over and over for such large numbers really gets tiring…  and besides, like any good programmer, I wanted confirmation that I was doing my math right…   And that’s where we move to the fun part of this story.

Where’s the LINQ?

Ok so at this point I have two of the three answers, but I wanted to make sure I wasn’t screwing up the “easy” part.  I figured that this type of problem isn’t exactly custom-tailored for a LINQ query, but I figured I could have some fun and learn a bit more about Linq by using “Enumerable.Range” and some Linq extension methods.  So it’s just a matter of getting the code working…    But one thing that I really hate is using Visual Studio to create some dinky little project to do this sort of stuff – I’d much prefer a “snippet compiler” that will let me just type a bit of code and run it.  There are some programs around that do this, such as…  (wait for it…) SnippetCompiler.  It is a neat program because it basically let you write fully functional program without having to open VS.net.  But I found it’s IntelliSense somehow clunky and something about it always made me not like it too much, but it was still better than the alternative.  However, recently I have found a FANTASTIC application called LinqPad.  LinqPad can also compile snippets, but it is much more than just that…  It can dynamically evaluate LINQ expressions (or entire .NET programs), translate between LINQ syntax and lambda syntax, show you the generated IL, and can access databases on the fly so you are working with real data.  The author even touts it as a replacement for SSMS, and it certainly could, but I think that’s going a bit far since the results display isn’t quite tailored to that type of data, and I don’t think you can edit data in place (but I might be wrong).  It includes a lot of examples and apparently you can download more.  BUT… one of the really nice things is that it can compile and execute single statements – they don’t even have to be full programs as they do in SnippetCompiler.  For example, you can just type in “Enumerable.Range(10,10).ToList().ConvertAll(x => new { X=x, XSquared = x*x })” and here’s what you get out:


Amazing!  And trust me… that’s only the tip of the iceberg.  (Note that the shaded line is a “total sum” of the column – I don’t know why they do that, but I don’t see a way to turn it off.)  LinqPad also provides IntelliSense, although that is a feature you have to purchase.

Now comes the crazy part…

OK so I now started writing up some some LINQ and I quickly realized that there was no Factorial function in Math or any of the other standard libraries, and there was no way I was getting by without it.  So my first hunt was to find a Factorial function.  Not just any factorial function, but one written in LINQ, of course.  (Note that this was only because I was having fun; LinqPad is fully capable of compiling and executing non-LINQ C# code, so I could have just written a function to do it.)  My hunt took an odd turn, though, because it seems that writing a recursive function in Linq isn’t as easy as it is in good old fashioned code.  But with the magic and a bit of wizardry, it can be done.  I won’t try to explain it (because I don’t even come close to understanding it) but you can find the solution here.  They use something called a “YCombinator” to do it, but to keep things simple and clear, here is just the lambda that results from its use:

Func<int, int> factorial = Extensions.YCombinator<int, int>(fact => n => n < 2 ? 1 : n * fact(n - 1));

And a quick test of this shows that it works:

var N = Enumerable.Range(1,10).ToList().ConvertAll(x => new { x = x, factorial = factorial(x) });


Ok so now instead of building up a whole lot of factorial expressions, I’ll also define a lambda that handles the Combination function, like so:

Func<int, int, int> C = (n, r) => factorial(n) / factorial(r) / factorial(n - r);

var N = Enumerable.Range(3,10).ToList().ConvertAll(x => new { x = x, factorial = factorial(x), nC3 = C(x, 3) });

Coolio!!!  Alright now we are talking, because now I can verify my answers.  In order to do this, I’ll just calculate each of the values from N=10 to N=20 (why not!), along with the corresponding percentage calculation:

var N = Enumerable.Range(10,10).ToList();
int r = 5;
var y = N.ConvertAll(x => new {
    M = x-r+1,
    mCr = C(x-r+1, r),
    N = x,
    nCr = C(x, r),
    Frac = ((decimal)C(x-r+1, r) / C(x, r)).ToString("P2")

Oops!!  Divide by zero???  How in the heck could that happen??  So I commented out the only place there could be a divide by zero, the Frac calculation:

var N = Enumerable.Range(10,11).ToList();
int r = 5;
var y = N.ConvertAll(x => new {
    M = x-r+1,
    mCr = C(x-r+1, r),
    N = x,
    nCr = C(x, r),
    //Frac = ((decimal)C(x-r+1, r) / C(x, r)).ToString("P2")

Sure enough – there’s the zeros…   but WHY – there’s no way that a binomial coefficient can produce zero!  …And the first several answers are all correct.  Then I realized that the problem you always have when dealing with Factorials – the numbers get big really fast.  Maybe int wasn’t the best choice…  so I switched to long:

static Func<long, long> factorial = Extensions.YCombinator<long, long>(fact => n => n < 2 ? 1 : n * fact(n - 1));
static Func<long, long, long> C = (n, r) => factorial(n) / factorial(r) / factorial(n - r);
var N = Enumerable.Range(10,11).ToList();
int r = 5;
var y = N.ConvertAll(x => new {
    M = x-r+1,
    mCr = C(x-r+1, r),
    N = x,
    nCr = C(x, r),
    Frac = ((decimal)C(x-r+1, r) / C(x, r)).ToString("P2")

Now that is more like it…  and so now I had the answers to TWO of the questions…   With 10 in the collection and choosing 5 at a time, you have only a 2.38% chance of not seeing one of the original items, and with 20 items you have a 28.17% chance of not seeing one of the original items.  So what would it take to get a 90% chance of not seeing one of the original items?  OK no big deal, I’ll just keep going…  Uhhhh…  wait a minute.  The long type ALSO has a maximum value…   I wondered how “long” would it be before I started getting mangled numbers?   Not very long it turns out.  In fact, the calculation goes bonkers on the very next value of N, N=21:

var N = Enumerable.Range(20,2).ToList();


So now what?  Double??  At first it appeared that it would work…   but RIGHT BEFORE I got to my answer of 90%, KABOOM

static Func<double, double> factorial = Extensions.YCombinator<double, double>(fact => n => n < 2 ? 1 : n * fact(n - 1));
static Func<double, double, double> C = (n, r) => factorial(n) / factorial(r) / factorial(n - r);
var N = Enumerable.Range(160,14).ToList();
int r = 5;
var y = N.ConvertAll(x => new {
    M = x-r+1,
    mCr = C(x-r+1, r),
    N = x,
    nCr = C(x, r),
    Frac = ((double)C(x-r+1, r) / C(x, r)).ToString("P2")


At this point, I want you to take notice of the fact that I’m already up to 160+ items in the collection…  even with this many, when choosing 5, and replacing 1 of those 5 with a new item, and then picking 5 more from the full collection, you still don’t quite have a 90% chance of NOT seeing one of the original 5.  REALLY?!?!  Out of 160+ items I’m only picking 5, trying to avoid 4 of them, and I don’t have a 90% chance of doing that!?!?!  No wonder I always lost at Battleship.  I could pick 5 items from the collection 10 times over and still not have picked one-third of the available items…   odd… but ok, I trust the math.

Anyway…  Still without an answer to the question, and coming SO close, I couldn’t drop it now.  So what is out there that’s larger than a double and will operate on my measly 32 bit machine?  Ahhh yes…  BigInt!  So that started my quest for a BigInt class for C#, and I was surprised how hard it was to find one.  Apparently, there used to be a BigInt included in the System.Core library in the pre-release versions of .NET 3.5, but it was taken out before the official release.  Too bad – that would have been the easiest way to go.  I did eventually find a C# BigInt here.

So now I plugged in my BigInt, and after a few tweaks to handle the implementation of the BigInt class and some tweaks for efficiency and display, I arrived at my answer:

var N = Enumerable.Range(190,11).ToList();
int r = 5;
var y = N.ConvertAll(x => new {
    M = x-r+1,
    mCr = C(x-r+1, r),
    N = x,
    nCr = C(x, r)
var z = y.ConvertAll(yy => new {
    M = yy.M,
    mCr = yy.mCr.ToString(),
    N = yy.N,
    nCr = yy.nCr.ToString(),
    Frac = (double.Parse(yy.mCr.ToString()) / double.Parse(yy.nCr.ToString())).ToString("P2")


Thus…   I calculate that you would need a collection of 194 items from which to pull 5 items in order to have a >90% chance that you wouldn’t see one of the original 5 (really 4, since one is taken away).  That seems totally counter-intuitive to me, but this apparently a good example of diminishing returns! 

My next quest…   to find Factorial/Combination classes (probably utilizing BigInt) that can efficiently handle the fact that calculations of this kind, although they involve HUGE numbers if you work them out directly as I have here, actually represent relatively small numbers since factors often cancel out and I probably could have calculated the answer as fast on paper faster as it took me to write the code (assming I knew what the answer was!).  The above query didn’t take that long to run on my machine (almost 25 seconds) but it took me a while to figure out where the right value was.  The point is, with specially coded Factorial and Combination classes, I think this could be much faster and much more efficient.

Finally, along the course of this, I also found a nice LaTeX Equation Image Genarator on the web that I used for the math images… actually I found 3 or 4, but this was the only one that seemed to be working, so thanks to Roger!


~ Michael D. Bray

Posted in .NET, Math, Snippets, Uncategorized | Leave a Comment »