CodeBetter.Com
CodeBetter.Com
RSS 2.0 via Feedburner
           Do you Twitter? Follow us @CodeBetter

Steve Hebert's Development Blog

Steve's Blog - From .Net to dotMath and everything in between.
  • Moving to Microsoft

    During the past month, I have accepted a new position at Microsoft.  I am now working on the Expression Web project that is part of the Expression Studio product suite.  It’s an exciting time as I am learning my way around the team, product and company.  Expression Web is an excellent web content design tool - from Standards Compliance, to developing cleaner CSS and being able to tie it in with ASP.NET and PHP all make this tool a must-have.

    So what is it like to work at Microsoft? It’s awesome – getting dialed in to the team here and getting to know different groups around the company is great.  After spending a few weeks in Redmond at Building 42, I am now working in the Minneapolis, Minnesota development office with a growing team.

    What does this change in my blog and projects – namely dotMath?  Nothing really - dotMath licensing stays the same and I will continue blogging here at CodeBetter.   I am looking to resume blogging and focusing on coding issues once again with a whole new set of blog content!

     

  • CSLA and Telerik radGrid- a collection of posts

    I am busy finishing up some knowledge transfer and found the need to gather together various postings I have done here and elsewhere on getting CSLA and Telerik playing nice.  A few of these topics deal with generic binding issues (CSLA aside).  Here are the items...

     

    Converations on CSLA and Telerik radGrid

    Conversation: Using the radGrid AddNew functionality with CSLA
    Code: Databound (IBindableList/IReportTotalRowCount) Paging and Sorting to a control 
    Step-by-Step: Hierarchical data binding with CSLA and Telerik's radGrid Control
    Code: Setting up CSLA databinding on a webpage using an @Register tag  

  • Creating a Horizontal GridSplitter in WPF - for real

    I ran into a number of articles on the web declaring how to create a horizontal grid splitter control in WPF - most of them wrong. There are a couple of "Walkthru" articles on MSDN that show the proper way to do it, but waste time poking around the Properties window (who codes like that anyway?). Besides, they are not all consistent - the first article I found calls for setting properties that are not listed in the second article I found and seem to be extraneous.   

    So here is a concise description focusing on the XAML alone - I haven't found a decent XAML code beautifier on the web yet so bare with me.  I am not inheriting any styles as you can see below.

    First,  create a Grid with an additional row to host the horizontal splitter.  Note that in this example I have two columns in my grid to show the spanning function:

    <Grid VerticalAlignment="Stretch" HorizontalAlignment="Stretch">
            <Grid.ColumnDefinitions >
                <ColumnDefinition Width="*" />
                <ColumnDefinition Width="*"  />       
            </Grid.ColumnDefinitions>
            <Grid.RowDefinitions>
                <RowDefinition Height="*" />
                <RowDefinition Height="Auto" />
                <RowDefinition Height="*" />
            </Grid.RowDefinitions>

     Next, create a GridSplitter on row #1 (remember that row indices are 0 based). 

    <GridSplitter 
                ResizeDirection="Rows"
                Grid.Column="0"
                Grid.ColumnSpan="2"
                Grid.Row="1" 
                Width="Auto"
                Height="3"
                HorizontalAlignment="Stretch"
                VerticalAlignment="Stretch"
                Margin="0"/>

    And there you have it.

  • Linq to Objects – measuring performance implications (part 1)

    After working with Linq-to-objects, I started thinking about how this tool could work in the wrong hands.  At its simplest, a seemingly elegant query could easily turn into a CPU hog if the underlying data structure isn’t organized well.  At its best, how will these queries make use of the underlying data structures?

    What to do?  Write some code and get some numbers to compare.

    What I found was interesting, eye-opening and opened more questions for another day. I set out to compare Linq performance over a reasonably well-optimized data structure by varying the where clause composition and also comparing this to code going directly after the data structure in typical pre-Linq fashion.

    What I learned was fascinating – let’s take a look at the code

    Given a dictionary defined as:
    private Dictionary<int, string> _lookupList = new Dictionary<int, string>();

    I populate it using a loop where the upper limit is a const I can vary:
    for (int x = 0; x < _listSize; x++ )
                    _lookupList.Add(x, string.Format("item#{0}", x));

    I can then use Linq to query the item dictionary with something like this:

                string[] myResult = (from i in _lookupList
                                     where i.Key >= lowerLimit && i.Key <= upperLimit && i.Value.Contains("1")
                                     select i.Value).ToArray();

    Let’s take three variations as follows:

    1)      where i.Key >= lowerLimit && i.Key <= upperLimit && i.Value.Contains("1")

    2)      where i.Value.Contains("1") && i.Key >= lowerLimit && i.Key <= upperLimit

    3)      Hand-code the query to go against the data structure to compare performance.

    To make sure I varied the access in different areas of the dictionary, the query was performed against 10 discrete key values in the beginning, middle and end of the dictionary. Performance was consistent across each item, showing that Linq was utilizing the datastructure to some degree – as expected.  The .Contains method further reduced the return set to range from 1-2 items depending on where the midpoint was calculated. These test sets were run 10 times and the averages taken.

    The results:

    Item#1 – averaged 2643 ms
    Item#2 – averaged 671 ms
    Item#3 – averaged <1 ms

    Observations: 

    The difference between #1 and #2 is a factor of 4.  I am not sure how this actually performed but I suspect that item #2 scanned the entire contents of the dictionary.  It is hard to tell since a test where you eliminate the where clause results in a very different memory allocation pattern and therefore inherently larger timing numbers.

    The difference between #3 and the other items is staggering.  Keep in mind, I tried to handicap this with a try/catch block around the indexer into the dictionary, stored the matching values in a List<string> and then performed a List<string>.ToArray() just for good measure. 

    That’s eye popping – it’s more than two factors of 10 faster than the best optimized Linq query and more than three factors of 10 faster than the worst optimized. 

    Bottom line - despite my best efforts I just couldn’t make my hand-written code perform as poorly as Linq. 

    Summary:

    So what did I learn? 

    ·         Variations on where-clause construction have an implied order when considering performance.  What are these implied rules exactly?  I'd love to know.

    ·         An entirely different test would have to be constructed to understand join performance.

    ·         How does the [Indexable] attribute play into optimization choices and what are the performance considerations of that attribute?

    ·         When doing intensive object querying,  Linq for Objects needs to be eyed very carefully in context of the application.

    +++  Addition per Comment

    Here is the code that Jimmy Bogard requested.  As noted in my post, I added exception handling and chose to use a List<string> to gather the match values and finally convert to an array of strings just as the Linq function did. If anything, I am going for the most conservative path in this case.  The call to this function is responsible for time measurement.

    private void HandCodedAccessorFunction(int lowerLimit, int upperLimit)
            {
                List<string> result = new List<string>();
                int y = 0;
                for (int x = lowerLimit; x <= upperLimit; x++)
                {
                    try
                    {
                        if (_lookupList[x].Contains("1"))
                        {
                            result.Add(_lookupList[x]);
                            y++;
                        }
                    }
                    catch (Exception)
                    {
                         Console.Writeline("Exception thrown");
                    }
                }
                string[] ar = result.ToArray();
                Console.WriteLine(y);
            }

    Posted Feb 06 2008, 05:02 AM by shebert with 26 comment(s)
    Filed under:
  • Manually Mapping WSDL soap:address location on the fly

     .Net does a nice job of sensing the underlying protocol when generating WSDL. For instance,  if I run a service as "myService" on SSL, the underlying WSDL is wired to contain the <soap:address> location mapped to "https://..." instead of "http://..."  However there are times when this sensing is not enough. A good relational setup between your load-balancer/encryptor setup should limit the change need to http->https, but this implementation also offers other flexibility as well.  I just can't think of a good use-case right now.

     When running with a front-end encryption appliance for offloading the cpu cost of SSL, the webservers themselves typically run on the http/port 80/non-encrypted protocol.  Since .Net senses this, the WSDL is generated with "http://" prefixes in the soap:address which is incorrect for consumers of the service.

     Handling this took a bunch of digging and feels extremely hack-ish.  But it's a necessity in this scenario.  To accomplish this task take the following steps:

    1. Derive a class from SoapExtensionReflector that overrides the ReflectMethod(). 
    2. Handle the remapping in the ReflectMethod() of the class created above.
    3. Reference your new class under the webServices/soapExtensionReflectorTypes in web.config.

    Here is a sample of the derived class:

        public class SoapAddressReflector : SoapExtensionReflector
        {
            public override void ReflectMethod()
            {
                ServiceDescription sd = ReflectionContext.ServiceDescription;

                foreach (Service service in sd.Services)
                {
                    foreach( Port port in service.Ports)
                    {
                        foreach (ServiceDescriptionFormatExtension extension in port.Extensions)
                        {
                           
                            SoapAddressBinding address = (SoapAddressBinding)extension;
                            address.Location = RemapHttpReferencesToHttps(address.Location);
                        }
                    }
                }
            }

            private string RemapHttpReferencesToHttps( string location )
            {
                    return location.Replace("http:", "https:");
            }
        }
    }

    Here is the code to add to the web.config file:

          <webServices>
            <soapExtensionReflectorTypes>
              <add type="Reuters.DataScopeSelect.Web.Api.SoapAddressReflector, Reuters.DataScopeSelect.Web.Api"/>
            </soapExtensionReflectorTypes>
          </webServices>

     

    For a bogus webservice named "myService" running server-native on http, the modified wsdl appears as follows:

     - <wsdl:service name="myService">

    - <wsdl:port name="myServiceSoap" binding="tns:ExtractionServiceSoap">
      <soap:address location=https://mypublicservice.com/myservice.asmx />
      </wsdl:port>
    - <wsdl:port name="myServiceSoap12" binding="tns:myServiceSoap12">
      <soap12:address location=https://mypublicservice.com/myservice.asmx />
      </wsdl:port>
      </wsdl:service>
      </wsdl:definitions>
     
     
  • Optimizing Hierarchical data for Tree Controls

    On one hand, nothing can substitute good data design when it comes to performance.  On the other hand, relational databases are terrible at describing hierarchical relationships.  Vendors have been including hierarchical relationship query tools for years (i.e. Oracle and MS Sql Server 2005), but while the semantics are cleaned up, performance is lacking because tree traversals on relational structures are time consuming.  So where does that leave us?

     

    In putting together this post, I ended up solving this problem once again and realized just how little accessible information is on the topic. By accessible, I mean not requiring a degree in tree theory (half joking).  Personally, I find I learn best by seeing a complete solution that works, understanding it and then applying new theory on top of it.  That begs the questions – is this post more accessible?  To me it is - so I'll run with it.

     

    Let’s set a rough goal – when asked about an item, I should be able to answer questions about or return its related tree structure in one sql statement.  I should also be able to do so without using vendor-specific extensions.  The best books that deal with this process dive into trees and graphs and analyze the pros and cons of different structures.  However, I have found that (1) most of the times that I am looking at these situations I am looking at reference data that tends to be very static and (2) I am typically populating some type of tree control with this data.  In these situations, I would argue that some of these mechanisms are more than what I need.  This post looks at a problem of delivering a tree structure to a user interface and the process of refining the database tables.  For this post I am using the typical employee/boss relationship to illustrate the problem.  This approach also applies to other areas as well: manufacturing bill of materials, order operations in medical scenarios, etc. with some caveats.

     

    Digging In

     

    Let’s say we are given a table of the typical employee/boss relationship.  These tables are so boring it’s not even worth coming up with names so I’ll use letters:

      
    Employee Boss
    G C
    E B
    B A
    C A
    D B
    F C
    A null
    H D

    Table 1.1 – Employee

     

    If I were to populate a tree control, I need to consider the minimum amount of data to populate my control.  I am going to assume that I will populate this control sequentially and that the use of simple hash-tables will be sufficient for establishing parent/child relationships.  Therefore, I need to have this ordered with the parent-less node first and the child-less nodes last.  To process sequentially, I need to have these ordered by layer – therefore the first layer is fully understood so the successive layer is able to reach it’s parent when populating the tree.  Therefore, I need a result table that looks like this:

     
    Employee Boss Layer (optional)
    A Null 0
    B A 1
    C A 1
    D B 2
    E B 2
    F C 2
    G C 2
    H D 3

    Table 1.2 - ProcessingResult

     

    This structure, when processed sequentially, can be added to a tree control quite easily. Therefore what structure do I need to transform Table 1.1 into Table 2.2?  Before I answer that question- lets think about other operations that are useful. 

    • If I know employee, say  ‘E', I should be able to grab the entire hierarchary (or n-levels up from the position of ‘E’ within the tree) in a single select statement. Selecting down the tree requires a different edge structure than what I am showing here.
    • I should also be able to determine that given two employees – are they part of the same reporting hiearchy? 
    • Is one subordinate to the other either directly or indirectly or are they peers?
     

    As with any structure, these are a few of the questions that drive the design.  I then decide that all of these should be query-able using a single select statement and easily optimized via one or more indices.  We’ll come back to these questions with examples of SQL-SELECT statements in a bit.

     

    Breaking down the problem

     

    To accomplish this task, we’ll use basic tree theory – which is a special kind of directed graph.  Graphs are data structures where nodes (our Employee table) are connected by edges (defined below as EmployeeHierarchy).  Each edge represents a one-way relationship.  To satisify our query requirements, we need to be able to identify the larger structure that each node participates in.  This could be a shared id, but in this case we’ll call it the RootBoss.

     
    RootBoss Employee Boss Layer
    A A Null 0
    A B A 1
    A C A 1
    A D B 2
    A E B 2
    A F C 2
    A G C 2
    A H D 3

    Table 1.3 – EmployeeHiearchy

      

    At this point, we have two solid questions (1) how do we satisfy our query requirements given these two tables and (2) how do we populate this thing?  First, lets get the fun part out of the way and look at how we query this table.

     

    Performing the stated queries

     

    First - given a single employee (@employee) – lets pull out the entire hierarchy:

     
    select a.RootBoss, b.Employee, c.employee, c.boss, b.Layer
      from EmployeeHierarchy a
      inner join EmployeeHierarchy b on a.RootBoss = b.RootBoss 
      inner join Employee c on c.employee = b.employee
     
      where a.employee = @employee
      order by b.Layer 



    Next, let’s pull up the hierarchy above and equal to the employee:


    select a.RootBoss, b.EmployeeName, c.employee, c.boss, b.Layer 
      from EmployeeHierarchy a
      inner join EmployeeHierarchy b on a.RootBoss = b.RootBoss
      inner join Employee c on c.employee = b.employee
      where a.employee = @employee 
        
    and a.layer >=  b.layer
     



    Given @employee1 and @employee2, are they part of the same reporting structure?


    select case when (select count(*) from EmployeeHierarchy a inner join EmployeeHierarchy b on b.rootboss = a.rootboss where a.employee = @employee1   and b.employee = @employee2 ) > 0  then ‘t’ else ‘f’ end as is_true 

    Finally, determine if employee1 is higher on the same reporting structure than employee2 (i.e. employee 2 reports to employee 1 either directly or indirectly).

     select case when (select count(*) from EmployeeHierarchy a inner join EmployeeHierarchy b on b.rootboss = a.rootboss where a.employee = @employee1    and b.employee = @employee2     and a.layer < b.layer) > 0  then ‘t’ else ‘f’ end as is_true  

    Populating the EmployeeHierarchy table (the edges)

     

    Because the number of layers is variable you can use an iteration process – or – this is a great place for using vendor specific hierarchy extensions.  In the spirit of keeping this as vendor-neutral as possible, I’ll ignore the extensions.  As such this is a simple, two-step process. 

     

    First, we must find all of the parent-less nodes into the table using the following command.


    insert into EmployeeHiearchy  (RootBoss, Employee, Boss, Layer)
          (select employee, employee, boss, 0
                from Employee
                where boss is null ) 

    Next, we need to populate each level until no more levels exist.  Once again, the depth is not known so we must loop until the process is complete.  Note that you should provide an exit condition for cyclical references.  Cyclical references amongst three or more nodes are particularly hard to discern with raw sql – using a maximum depth count is one such mechanism.

    declare @rowsprocessed int
    set @rowsprocessed = 1
    declare @layer int
    set @layer = 1
    while( @rowsprocessed > 0 )
    begin
          insert into EmployeeHierarchy
                ( RootBoss, Employee, Boss, Layer )
              select  eh.RootBoss,  e.Employee, e.Boss, @layer
                from EmployeeHiearchy eh
                inner join Employee e on  eh.Employee = e.Boss
                where eh.Layer = @layer – 1       
         
    Set @rowsprocessed = @@ROWCOUNT
          Set @currentlevel = @currentlevel + 1
    end 

    Summary

     

    And there you have it.  A reasonable tree structure processing that is very performant for queries.  It's not a one size fits all by any means - keep in mind that if you have a constantly updating structure then you need to keep an eye on the cost of updating your edge table.  While the approach above could be made transactional, the sheer resource cost of that process is prohibitive – especially in real-time conditions.  But I would contend that this is a good start to understand the problem and evaluate other approaches.

     

    Keep in mind that the "edge" table is really a many-to-many resolve table, and as such it can be made to accomodate tree structures that vary by external conditions.  This could be the case with certain types of bill of materials problems or medical order components.  One good example is a veterinary office where a dental prophy has differing structures based on species - the edge table can typically handle these scenarios through additional fields and multi-column key considerations.

     

    When looking for performant transcational base hierarchy, there are better solutions that use other areas of graph theory for a better fit.  Whether you need this extra performance or not, I highly recommend checking out Joe Celko’s Sql for Smarties books – he has one book dedicated to hierarchical structure processing.  Celko has been on the ANSI SQL committee just shy of forever and has been writing articles on SQL programming for years. His approach is fundamentally different from sequential-oriented programmers, focusing first on set-based theory and translating that approach to the SQL dialect. 

       

     

  • digging out from under the Inbox

    It has been a while since my last post - we've shipped version 2.0 of our product and with ever-increasing momentum, Outlook becomes increasingly difficult to manage due to sheer volume.  I have never been particularly fond of Outlook, it leaves a lot of loose ends without a cohesive way to manage it.

    While I've read the book "Getting Things Done" by David Allen, I agree with his strategy of building what he terms a trusted system where every outstanding action is captured so you don't waste cycles trying to juggle commitments in memory.  Part of this strategy is triaging every incoming message (phone/email/etc) until your 'inbox' is completely empty and filed away with appropriate recording.  It makes sense, but implementing this is time consuming.

     A friend of mine pointed me to Clear Context.  I've tried some other outlook plugins that never really worked well or they were so clunky they became annoying (rather than integrating with outlook, they launch a separate window that contains a new interface).  Clear Context is different in that it integrates with Outlook 2003 and 2007 seemlessly (I'm currently only running with 2003) .  All email processing can be handled with the new buttons they add for automatically creating tasks from email, creating schedule items from email and filing emails.  It also auto-files items in threads into folders along with my reponses - no more digging through sent mail.  And the Action View is excellent.

     I now have my inbox down to 0 items - everything is filed and captured with minimal effort.  That's a great feeling. In my mind, this is what Outlook should have been in the first place.  Their personal edition is free and the professional edition is ~$80.  Below are some links to the items including a "Getting Things Done via Clear Context" pdf that is pretty useful.

     

    Clear Context - Outlook plugin: http://www.clearcontext.com/
     
     
     
    Here is a 16 page overview of how to implement David's system using Clear Context and Outlook: http://www.clearcontext.com/resources/attachments/Using_IMS_for_Outlook_with_GTD.pdf
     
    David Allen's Website is here: http://www.davidco.com/

     

     

  • XP Media Center lost my second core (once again)

    In the last 12 hours, Windows XP Media Center lost my second core again. I blogged about this a while back where suddenly my dual-core system was only utilizing a single core - apparent in task manager, device manager and general performance.  Even with 3GB of RAM, this sent things crawling in Virtual PC.

    I'm guessing that some patch was dispatched today (Sunday) and the system stopped recognizing the second core.  I wonder if anyone from MS and/or AMD is watching.  I reran the driver that I linked to in my blog entry and all was restored. That's annoying.

    I wonder how many dual core users are running 3 cans short of a 6 pack without noticing.

  • Hiearchical data binding with CSLA and Telerik's radGrid control.

    Following on the CSLA databinding theme, I ran into an interesting problem on databinding a parent/child CSLA business object relationship with Telerik's radGrid control. The solution is certainly not obvious, but the implementation ends up being pretty straight forward.

    First, I set up two distinct CslaDataSource controls - one for each collection.  In reality, these are part of the same object hiearchy and I could do this with one control, but I feel it comes out a little cleaner with two.

    The key points to consider are this:

    1. Grid: Set the MasterTableView and DetailTable DataSourceIds to their respective CSLA data binding control (dbc).
    2. Grid: Set the MasterTableView's DateKeyNames property to reflect the unique identifier for the parent row.
    3. DBC Parent: Set up the DataSource_SelectObject to return the business object via the e.BusinessObject parameter.
    4. Grid: Map the ItemCommandEvent to grab parent key using the code snippet below.
    5. DBC Child: Set up the DataSource_SelectObject to return the appropriate child object according to the key obtained in step #4.

    Below is the code snippet for the Grid_ItemCommand event mentioned above:

    private Guid ActiveParentId = Guid.Empty;  //data type may vary for actual implementation.

    protected void ..._ItemCommand(...)
    {
       if( e.CommandName == ExpandCollapseCommand )
          ActiveParentId = (Guid) e.Item.OwnerTableView.DataKeyValues[e.Item.Index][parentIdName];
    }

  • Setting up CSLA databinding on a webpage using an @Register tag

    Setting up databinding using a custom datasource within a page is not always obvious.  I'm putting this on my blog as I know I'll be referring back to it.

    CSLA is available on Rocky Lhotka's site with downloads, forums and latest version info.

    Because databinding is provided through the generic CslaDataSource object in the CSLA framework, there's a bit of indirection going on in the setup.  Rocky's sample code in ProjectTracker defines the page level references in the web.config file, but I'd rather explicity define them on the page for now.

    First, register the Csla assembly in the @ Register tag at the top of your aspx file with:

     <%@ Register Assembly="Csla" Namespace="Csla.Web" TagPrefix="csla"  %>

    This allows for a reference to the datasource provider in the framework.

    Next, define the actual datasource object on your page that references your business object/collection using the following:

     <csla:CslaDataSource ID="MyDataSource" runat="server" 
         TypeName="[Company.Product.Layer.Class]" 
         TypeAssemblyName="[Assembly Name that implements TypeName]"
         TypeSupportsPaging="False"
         TypeSupportsSorting="False" >
    </csla:CslaDataSource> 

    You'll need to wire up the OnSelect and other handlers, but Rocky's book provides excellent examples of this.

  • XNA Game Studio Express going live on Monday

    The XNA Game Studio Express is launching on December 11th with an open house on the Redmond campus.  Too bad they only give a weeks notice, that would be a fun road trip. 

     I haven't heard yet if they are going to post video of the event on Channel 9. 

  • Release parties

    We are busy finalizing our 4th public release of this year and we've decided to go to a Minnesota Wild game (hockey for you non-sports fans).  Club level, dinner, and all that stuff.  I'm not a huge hockey fan, but live NHL is great to watch.  I actually learned the rules of hockey from playing NHL '98. 

    We've done go-karts and other activities in the past, but I'm wondering if anyone has other great ideas for release parties? 

  • First Minneapolis/St. Paul Code Camp complete

    The first Twin Cities CodeCamp went off very well this weekend.  There was a wide variety of topics with four presentations running every hour from 9-5 this past Saturday.  If you missed it, you missed out! 

    Keep an eye on the Twin Cities CodeCamp website for the next event. This would be a great place for an RSS feed - I wonder if anyone is organizing a space where all CodeCamps could host their information and provide such services?  That would be very cool.

    Congrats to Jason Bock in doing much of the organizing and pulling this together.