Wednesday, March 9, 2011

InstallShield, .NET DLLs and DotNetCoCreateObject()

One of my projects requires that I call out to a DLL written in C#, which requires .NET 2.0 SP2 (or CLR20r3).  When I first went to implement this using InstallShield's DotNetCoCreateObject() method, it took a lot of futzing about before I could load the DLL successfully.  I also had horrible problems with handling any exceptions that occurred within the DLL.  And just yesterday, the developer made a change within the DLL and it broke the installer again.

(That, by the way, is the best part of being the "installer-guy".  Whenever anything breaks, they look at you.  Never mind if you haven't had a check-in for the past three days and it just broke last night, you are guilty until proven innocent.)

Anyway - the error we'll look at today is a delightfully informative one.  When caught, all you'll get is a number: -2147219705.
Gee, thanks... that's helpful.
The first thing you should know is always wrap your DotNetCoCreate in a try/catch block, otherwise you will get burned down the road.  Here's how mine is loaded:

 szDLLPath = SUPPORTDIR ^ "MyDotNet.dll";   
 szClassName = "InstallHelper.DoInstall";
 try
    set oInstHelp = DotNetCoCreateObject(szDLLPath, szClassName, "");
 catch
    SprintfBox (INFORMATION, "Error","Error occured: %i\n\n%s\n\n%s", Err.Number, Err.Description, Err.LastDllError);
    abort;
 endcatch;
In the case above, my C# DLL contains a namespace called "InstallHelper" and a class within that namespace called "DoInstall".  The szClassName var must match what's in the DLL for starters, otherwise you will get that -2147219705 right at the get-go.  Note also that if there's anything in the implementation that InstallShield is not a fan of, you will also catch the -2147219705.  (Yesterday's problem seemed to be just that - the developer had tried to merge two DLLs together, they worked fine from a test console application, but when InstallShield loaded the DLL it barfed.)  In short, this seems to be the error that InstallShield will throw when it just plain doesn't like your DLL.

If this is the first time you're dealing with a C# DLL and DotNetCoCreateObject(), I recommend that you (or the DLL developer) begin as simply as possible.  The first test case should have a single public method that does something incredibly simple, like writing a text file or creating an event log entry.  A message box is not necessarily the simplest because, depending on when you invoke the DLL, any UI elements may be hidden, delayed, or simply fail silently.  Make sure that you have a working framework for your DLL communication before you or the dev spend hours on the project.  Once you know you can load the DLL and call a method successfully, you'll have a baseline to back out to in case any future changes break the implementation. (I hope it goes without saying that if you'll be calling such a DLL, you should have a prerequisite set for the specific .NET runtime that the DLL requires...)

Once you make it through that try/catch block, calling methods in the DLL is very simple. 

try
    nResult = oInstHelp.DoSomething(szSomeText);
catch
    SprintfBox (INFORMATION, "Error", "Error occured: %i\n\n%s\n\n%s", Err.Number, Err.Description, Err.LastDllError);
    abort;
endcatch;
That's all there is to it.  There's no prototyping involved, just call your method.  Method parameters called by-reference work transparently - if "DoSomething()" above just set the string to "Did It", I would find szSomeText being equal to "Did It" in the next line.

Caveats

There are some potholes though.  There's a bug in IS2010, not sure if it persisted in to IS2011, where strings passed to objects created with DotNetCoCreateObject() are passed as an entire 4KB buffer, padded with space characters.  This can cause problems in your DLL if you don't handle it internally.  For example, say you pass SQL credentials to the .NET object.  If your username is 'dbadmin', and you don't trim the string in the DLL, you'll attempt to login as 'dbadmin\0[space][space][space][space]...' to 4KB.  We addressed this issue by passing all input strings to a "CleanUpNulls" function (in the DLL, that is), which trims the input string to the first NULL character.

Note too that this buffer handling can pose problems for the IS developer as well.  This was a fun one to fix!  Let's say you have a text input field whose contents you'll be sending to the DLL.  When the user gets to that field, they initially type an eight character string (e.g., "AcmeCorp").  Later, they return to the field and change it to a four-character string, "Acme".  When you pass the string to the DLL, it will look like this:

Acme\0orp\0[space][space][space]...

If the DLL trims the string on the first null, you're ok.  In our case, the dev had setup the CleanUpNulls to trim the string on the first null character from the right, which meant the string they were working with was "Acme\0orp".  I ended up fixing that particular problem in InstallScript by copying the string to a new var character by character until I reached a null, it would've been easier (in retrospect) to fix the CleanUpNulls in the DLL...

Final Note on Exception Handling

If you will be making a lot of calls to the DLL, you may want to make use of the .NET Application Domain to avoid a strange exception-bubbling mess.  I discovered in my case that if the DLL threw an exception on the first of three methods I called, the subsequent calls appeared to error immediately.  I spent a lot of time on this problem as well, and in truth only came up with a working hypothesis: the nested nature of most .NET exceptions were leaving an "exception queue" behind whenever an error was caught in InstallScript.  So your try/catch around a method call would clear the top-level exception, then you establish a new try/catch block and go to make another call - but there's still an unhandled exception bubbling from the last method, so even if your second method has nothing to do with the first, it will still raise an exception.

I eventually got past this by using an app domain when I called DotNetCoCreateObject, and clearing and recreating the object before each method call.  I then used a nested try/catch, where each method call in its own try/catch block would, in the catch block, raise another exception to break out of the outer try/catch block.  An exception in one of the methods would then bypass any subsequent calls, prompt the user to fix the problem, and then (before restarting) unload the app domain.  This effectively flushed the "exception queue" so that I could give the user a chance to fix the problem and try the process over again.  A brief example is shown below:

:TryAgain
try         
    bTryAgain=FALSE;

    // Drop and recreate the DotNetObject - clear any as-yet-unhandled exceptions
    DotNetUnloadAppDomain("InstHelpDomain");
    try
        set oInstHelp = NOTHING;
        set oInstHelp = DotNetCoCreateObject(szDLLPath, szClassName, "InstHelpDomain");
    catch
        SprintfBox (INFORMATION, "Error","Error occured: %i\n\n%s\n\n%s", Err.Number, Err.Description, Err.LastDllError);
        abort;
    endcatch;
                        
    // Get the log file name in case of errors
    nRet = oInstHelp.GetLogFileName(szLogFilePath);
    if (nRet != 0) then
        Err.Raise(-1);
    endif;
           
    try
        set oInstHelp = NOTHING;
        set oInstHelp = DotNetCoCreateObject(szDLLPath, szClassName, "InstHelpDomain");
    catch
        SprintfBox (INFORMATION, "Error","Error occured: %i\n\n%s\n\n%s", Err.Number, Err.Description, Err.LastDllError);
        abort;
    endcatch;
           
    // call another method, etc...
    .
    .
    .
catch 
    MessageBoxEx("Blahblahproblemfixit", "UhOh", SEVERE);
    LaunchApplication(WINDIR ^ "notepad.exe", " " + szLogFilePath, WINDIR, SW_SHOW, 30000, LAAW_OPTION_WAIT);
    if (AskYesNo("Wanna try that again?", YES) == YES) then
        bTryAgain = TRUE;
    endif;
endcatch;         

if (bTryAgain) then
    goto TryAgain;
endif;
That convoluted mess was the only way I could catch an exception raised from a .NET DLL, give the user an opportunity to do something about it, and then try it all over again.   I can't help but imagine there's a better way, but damned if I found it.  When I came up with the hypothesis about the nested exceptions and the exception queue, this was the first thought I had for dealing with it, and it has worked out nicely.  It may lack elegance, but it certainly qualifies for "if it ain't broke, don't fix it".

So there we go.  Again, I hope I save some poor bastard from having to figure all of this out on their own, and I wish I could've found a page like this when I first had to do it.  InstallShield's forums can be helpful sometimes, but I found them seriously wanting when it came to the DotNetCoCreateObject stuff.  Hopefully this post gets sucked into the search engines of the world and saves someone a few hours of work and makes them look like a genius.  Good luck.

Friday, February 18, 2011

Installshield debugging on a remote PC [SOLUTION]

This is the issue that prompted my starting this blog.  And as stated above, it may end up being the only post I make, only time will tell.

Due to no fault of my own, my organization uses InstallShield (currently 2010 premier) for package development.  Occasionally, an issue will arise that requires that I debug an installation on test machine, that is, I'll encounter an issue that I cannot explain via logs or even in-line debugging and am forced to run the installation in debug mode on the offending device.

Today, I lost nearly seven hours of what could have been highly productive time simply trying to get the debug mode started on a target.  When I began, I realized that I hadn't actually run a remote debug session since we upgraded our projects to IS 2010.  I wasn't terribly worried about that, and proceeded in the fashion described in all of the Flexera documentation (one example of which can be found here.)  I followed the instructions to the letter and yet the debugger simply would not come up.

I will refrain from describing my efforts in the registry editor, double-checking my Setup.dbg file, running Filemon and Regmon to see if I could figure out why ISdbg.exe was not attaching to my installer.  Suffice to say, I spent a ridiculous amount of time trying every command-line permutation, moving my installer source all over the filesystem, trying to run it off the network, and rubbing my temples until they turned into shiny red mirrors.

Finally I keyed in on one specific line in the various troubleshooting docs I'd been finding online and in the help:

Note that if your installation, script, and debug files are all in their original location on the development system, you do not need to specify the path to the debug file—use the following: setup.exe /d

I was at that troubleshooting stage where you stop checking if an idea is dumb, and so I changed the filesystem on the target so that I could place my installer source in the exact same location that it lived on my development machine.  My development directory lives on my D: drive, so that required changing the drive letter of the CD on the target, and creating a R/W filesystem on D: (this was a VM, so it wasn't terribly hard; you could probably do this with a mapped network drive or a USB stick, etc.)  

I created all the subfolders to match my development system, then copied the entirety of the installer source to the target system.  I opened a command prompt and navigated to the release image in the source, then ran the installer with the  "/d" switch:

D:\ISDev\Trunk\ProjectXInstaller\Main\Full Installer\DiskImages\DISK1>projectx_setup.exe /d

Success.  Frustrating, infuriating, teeth-grinding success.

It has taken me so long to get the damn debugger up that I only have a vague idea of what I set out to test, I reset the VM as part of the troubleshooting so I need to reproduce the issue all over again, it will probably take me another hour to get to where I should've been at 9:30 this morning.  Anticipating that something like this will happen to me again shortly after I've forgotten how I solved it this time, I decided to take a breather to make this post.  Hopefully someone else will find this while debating whether a new career in the custodial arts would be better than working with InstallShield and can avoid the horror that was my day.  

At least it's Friday.  After I make up all the time I wasted on this tomorrow, I'll still have Sunday to wind down...

Inaugural post

I have a long and storied history of starting blogs, making one or two posts, and never coming back.  So let me make it abundantly clear that I may do the same thing here.

The goal of this blog is simply to give me a place to record my occasional victories working in the fields of build and packaging management.  I've discovered that this is one of the most challenging software disciplines, and I suspect that this is due to the ratios involved.  One build or packaging manager can support a development team of thirty or more, which right there means the ratio of general devs to those with real build/package experience is 30:1.  Further, there are many general devs who are pressed into build/package management and treat it as a peripheral task, and never become specialized in the field.  Finally, the very few of us who have been immersed in one or both of these disciplines for any length of time are (in my experience) incredibly pressed for time, so the number of useful resources on the interwebs is almost non-existent. 

So anyway -- I may just have these first two posts, or I may return again someday when I've lost an entire day to something which should've been dead simple had the tools we use been designed correctly.  If anything here is of assistance to you, I'm happy to have helped.