CodeBetter.Com
CodeBetter.Com
RSS 2.0 via Feedburner
           Do you Twitter? Follow us @CodeBetter

Greg Young [MVP]

  • Required Course

    I don't recommend very much stuff to people that actually costs anything more than time but ...

     

    Udi is coming over and doing his SOA course in Austin. I can't recommend highly enough that people take this if they want to learn SOA/Distributed Systems right.

     

    http://www.headspringsystems.com/soa/ 

     

     That and you can enjoy the esoteric discussions Udi and I end up in over a few beers late at night in a random conference hotel without thinking we disagree with each other :)

  • Fast Serialization

    lileBook We use ALOT of serialization in the current system I work with. Serializing/deserializing 100,000,000 objects in a day is pretty common. For a long time we knew that the binary formatter was fat and slow but never rationalized writing something custom as we were always fast enough. Unfortunately our data throughput has raised 400% in the last year (when you start with gigs and gigs of messages this is a huge gain) and our little three or four year old dual xeon 2.2 has turned into the little engine that could during peaks lately so we finally bit the big one and threw something together quickly.

    A toast ... to the little server that could!

    This solution is for a fairly niche condition and is heavily optimized so please read the explanations below to see if it will be good for your scenario before using it.

     

    This is the first of a series of posts dealing with this ... Let's start with introducing a new interface to our system

        public interface ICustomBinarySerializable
        {
            void WriteDataTo(BinaryWriter _Writer);
            void SetDataFrom(BinaryReader _Reader);
        }

    You would then implement this interface in your object like this, only write out exactly what you need and write it in the simplest way possible.

        class TestObject : ICustomBinarySerializable
        {
            public int Integer;
            public TestObject(){}
    
            public TestObject(int _Integer)
            {
                Integer = _Integer;
            }
    
            public virtual void WriteDataTo(BinaryWriter _Writer)
            {
                _Writer.Write((int) Integer);
            }
    
            public virtual void SetDataFrom(BinaryReader _Reader)
            {
                Integer = _Reader.ReadInt32();
            }
        }

    Then I wrote a custom formatter that operates on objects that are ICustomBinaryObjectSerializable. You may note that for the index that represents the type I use an integer. This is probably more appropriate to be a short than an integer and we could save a few bytes here.

        public class CustomBinaryFormatter : IFormatter
        {
            private SerializationBinder m_Binder;
            private StreamingContext m_StreamingContext;
            private ISurrogateSelector m_SurrogateSelector;
            private readonly MemoryStream m_WriteStream;
            private readonly MemoryStream m_ReadStream;
            private readonly BinaryWriter m_Writer;
            private readonly BinaryReader m_Reader;
            private readonly Dictionary<type, int> m_ByType = new Dictionary<type int >();
            private readonly Dictionary m_ById = new Dictionary();
            private readonly byte[] m_LengthBuffer = new byte[4];
            private readonly byte[] m_CopyBuffer;
    
            public CustomBinaryFormatter()
            {
                m_CopyBuffer = new byte[20000];
                m_WriteStream = new MemoryStream(10000);
                m_ReadStream = new MemoryStream(10000);
                m_Writer = new BinaryWriter(m_WriteStream);
                m_Reader = new BinaryReader(m_ReadStream);
            }
    
            public void Register(int _TypeId) where T:ICustomBinarySerializable
            {
                m_ById.Add(_TypeId, typeof(T));
                m_ByType.Add(typeof (T), _TypeId);
            }
    
            public object Deserialize(Stream serializationStream)
            {
                if(serializationStream.Read(m_LengthBuffer, 0, 4) != 4)
                    throw new SerializationException("Could not read length from the stream.");
                IntToBytes length = new IntToBytes(m_LengthBuffer[0], m_LengthBuffer[1], m_LengthBuffer[2], m_LengthBuffer[3]);
                //TODO make this support partial reads from stream
                if(serializationStream.Read(m_CopyBuffer, 0, length.i32) != length.i32) 
                    throw new SerializationException("Could not read " + length + " bytes from the stream.");
                m_ReadStream.Seek(0L, SeekOrigin.Begin);
                m_ReadStream.Write(m_CopyBuffer, 0, length.i32);
                m_ReadStream.Seek(0L, SeekOrigin.Begin);
                int typeid = m_Reader.ReadInt32();
                Type t;
                if(!m_ById.TryGetValue(typeid, out t))
                    throw new SerializationException("TypeId " + typeid + " is not a registerred type id");
                object obj = FormatterServices.GetUninitializedObject(t);
                ICustomBinarySerializable deserialize = (ICustomBinarySerializable) obj;
                deserialize.SetDataFrom(m_Reader);
                if(m_ReadStream.Position != length.i32) 
                    throw new SerializationException("object of type " + t + " did not read its entire buffer during deserialization. This is most likely an inbalance between the writes and the reads of the object.");
                return deserialize;
            }
    
            public void Serialize(Stream serializationStream, object graph)
            {
                int key;
                if (!m_ByType.TryGetValue(graph.GetType(), out key))
                    throw new SerializationException(graph.GetType() + " has not been registered with the serializer");
                ICustomBinarySerializable c = (ICustomBinarySerializable) graph; //this will always work due to generic constraint on the Register
                m_WriteStream.Seek(0L, SeekOrigin.Begin);
                m_Writer.Write((int) key);
                c.WriteDataTo(m_Writer);
                IntToBytes length = new IntToBytes((int) m_WriteStream.Position);
                serializationStream.WriteByte(length.b0);
                serializationStream.WriteByte(length.b1);
                serializationStream.WriteByte(length.b2);
                serializationStream.WriteByte(length.b3);
                serializationStream.Write(m_WriteStream.GetBuffer(), 0, (int) m_WriteStream.Position);
            }
    
            public ISurrogateSelector SurrogateSelector
            {
                get { return m_SurrogateSelector; }
                set { m_SurrogateSelector = value; }
            }
    
            public SerializationBinder Binder
            {
                get { return m_Binder; }
                set { m_Binder = value; }
            }
    
            public StreamingContext Context
            {
                get { return m_StreamingContext; }
                set { m_StreamingContext = value; }
            }
        }

    So that it is clear how this works ... when you instantiate a custom formatter, you associate types back to integer ids. Example from my tests:

    formatter.Register<TestObject>(1);

    This says when you get a type id of 1 it should be a TestObject and vice versa when you write a TestObject give it a type id of 1.

     

    When writing an object the format is

    <4 bytes length><4 bytes type id><object data>

     

    When we read the data we first read the 4 bytes of length (n), then read n bytes off the stream. We then copy that into our local buffer (see notes below). We then seek to the beginning of the buffer and tell the object to read its state using the binary reader we provide to it.

     

     

    Performance

    Before we look at all of the bad an evil things this is doing let's try some basic performance tests. To run tests I used the following simple object (what this library was designed to be really fast with). I grabbed this object off someone's blog who was also playing with serialization and added the interface but can't seem to find the link of which one it was to give credit for saving me a good minutes worth of typing :).

     

    [Serializable]
    public class Customer : ICustomBinarySerializable {
         private String _lastname;
         private String _firstname;
         private String _address;
         private int _age;
         private int _code;
    
        public Customer()
        {
            
        }
        public Customer(String lastName, String firstName, String address, int age, int code)
        {
            _lastname = lastName;
            _firstname = firstName;
            _address = address;
            _age = age;
            _code = code;
        }
    
        public String LastName {
               get {return _lastname;}
               set {_lastname = value;}
         }
         public String FirstName
         {
               get {return _firstname;}
               set {_firstname = value;}
         }
         public String Address
         {
               get {return _address;}
               set {_address = value;}
         }
    
         public int Age
         {
               get {return _age;}
               set {_age = value;}
         }
    
         public int Code
         {
               get {return _code;}
               set {_code = value;}
         }
    
        public void WriteDataTo(BinaryWriter _Writer)
        {
            _Writer.Write((string)_lastname);
            _Writer.Write((string)_firstname);
            _Writer.Write((string)_address);
            _Writer.Write((Int32)_age);
            _Writer.Write((Int32)_code);
        }
    
        public void SetDataFrom(BinaryReader _Reader)
        {
            _lastname = _Reader.ReadString();
            _firstname = _Reader.ReadString();
            _address = _Reader.ReadString();
            _age = _Reader.ReadInt32();
            _code = _Reader.ReadInt32();
        }
    }

     

    Speed

    To test the speed of the serializer I chose to serialize / deserialize one of these objects 10,000,000 times to/from a MemoryStream.

    Test Time (lower is better)
    Serialize (Binary) 01:48.54
    Serialize (Custom) 00:06.73
    Deserialize (Binary) 2:01.29
    Deserialize (Custom) 0:08.55

    So on serializing the custom one is a whopping 1612% faster and on deserializing it is 1418% faster. That's not too bad as both are more than an order of magnitude.

     

    Size

    The other area I really wanted to optimize as it is common for us to have 40+ gb transaction files for a day (disk IO is expensive) is the size of each message. Because we are not writing the same kind of schema information that the binary formatter does we can also be quite a bit smaller than its output. For the object given the binaryformatter results in 232 bytes of output while the custom formatter results in 41. This message has quite a few string which add into the amount of serialized data (on our messages (about 40) we average about a 1/10 ratio between the two). Even so its still a 500% gain in storage space required. Don't let this fool you though there are some  ....

     

    Problems

    There are a number of problems with this type of strategy. It is imperative that you know about the tradeoffs involved with this code before using it. This was written for a niche situation and it may really hurt you if you aren't careful!

     

    Versioning

    There is no versioning information provided by default in the data. One could easily provide this in their custom serialization implementation but the formatter does not provide it by default for you.

     

    Endianess

    One of the interesting things here is dealing with the length. I have done this using a quite unsafe (but faster) solution.

        [StructLayout(LayoutKind.Explicit)]
        public struct IntToBytes
        {
            public IntToBytes(Int32 _value) { b0 = b1 = b2 = b3 = 0; i32 = _value; }
            public IntToBytes(byte _b0, byte _b1, byte _b2, byte _b3) {
                i32 = 0;
                b0 = _b0;
                b1 = _b1;
                b2 = _b2;
                b3 = _b3;
            }
            [FieldOffset(0)]
            public Int32 i32;
            [FieldOffset(0)]
            public byte b0;
            [FieldOffset(1)]
            public byte b1;
            [FieldOffset(2)]
            public byte b2;
            [FieldOffset(3)]
            public byte b3;
        }

    This has endian problems if you use it on multiple machines that have different endianess like say mono on a ppc vs clr on x86. One could easily get around this by just using BitConverter instead (or doing some binary arithmetic if you miss having real reasons for doing so :)). For us however most of these objects are being serialized between processes on the same machine so its not an issue for us.

     

    Copying of Data

    Another problem (read: decision) has to do with how the formatter deals with the stream itself internally. It copies data off the stream into an internal memory buffer, it does this so it can reuse the same binaryreader/writer every time. This makes it non-reentrant and forces the copy but in testing with many very small messages the copying of the data turned out to be faster than creating a new Reader/Writer to the original stream on every iteration. This may turn out different for you, I will leave it as an exercise for the reader to change this (I promise it won't take more than 5 minutes)

     

    Typing

    Its a lot of typing in your objects (we can work around this with some IL generation) but that's a whole other post isn't it.

     

    Anyways I hope people enjoy this and can find a niche place of their own to use such a strategy.

  • DevTeach

     

    Looks like I will be at DevTeach in Montreal this December. I will be doing 4 talks (so much to keep in my head at once). There are many other great people showing up that I really look forward to seeing.

     

     

    In the Agile Track I will do a talk

     

    "TDD in a DbC World"

    Design by Contract is slowly moving its way into the mainstream. Many wrongfully find Test Driven Development and Design by Contract to be in conflict with each other.

    This session will familiarize the audience with some some basic concepts of Design by Contract and the use of a theorem prover for the static checking of contracts. Discussion will then look in more depth at how we can maintain a Test First mentality in a Contract First world.

     

    This talk is the same as the one I am doing in the alt.net track at QCon

     

     

    The rest of the talks are all in the Architecture Track:

    Domain Driven Design Chalk Talk

    We as developers and designers face increasingly more difficult problem spaces. By creating models around these problems we can create better, more flexible, longer lasting, and further distilled solutions to these problems. Domain-Driven Design is a formalization of this process.

    This talk introduces many of the basic patterns in Domain-Driven Design but instead of focusing on the patterns themselves it focuses on the interactions and intentions of the patterns. In other words, we will talk about "entities" for about 30 seconds before we get down and dirty on some real life problems and handle the tough stuff like determining aggregate boundaries and the roles of application services.

    A novice should be able to take away something from this talk, but then again so should an expert.

     

    I am particularly looking forward to this one as I have done a few of these in the past but never actually "prepared" for one. After watching the video from alt.net this weekend there are definitely some places where having a list of what I want to talk about will come in handy.

     

    The Non-Functional Juggler

    This is not just me making failed attempts at keeping flaming knives in the air, although that would probably be more entertaining.

    Non-Functional specifications are at the core of any architecture. Learning to balance non-functional specifications with each other and align them with business needs is the most important skill an architect can possess.

    The presentation looks at some of the varying types of non-functional specifications, how they interact with each other, and how you as an architect can determine the level of success for your project by managing them.

     

    Command Query Separation

    Betrand Meyer introduced the concept of Command and Query separation to Design by Contract nearly 30 years ago. Command and Query separation need not only apply at a micro-level to our code but should be a key architectural theme in our systems.

    This presentation after defining Command and Query separation as a theme, looks at a few common architectures and how we can improve them through the strong use of separation.

  • TCP: Buffer Management

    So a long time ago I wrote some posts on buffer management in TCP servers (it might be worth going back and reading them as they explain why buffer management is important. There have been a few comments lately asking for more complete examples. Funny enough that's just what I have been working on lately. So here is the first of a set of code drops of some code that will be open sourced (consider it MIT/MSPL now) in its entirety (a nice little framework for writing scalable servers, TCP transports, etc). The project doesn't have an official name yet and is being run on our local svn so I will just upload a zip file for now.

     

    You can download the source here: http://codebetter.com/files/folders/codebetter_downloads/entry181822.aspx

     

    A quick run down of what is there.

    BufferManager.cs - The main Buffer Manager class

    BufferPool.cs - A class that abstracts a set of buffers to allow common operations

    BufferPoolStream.cs - An adapter to the stream interface for a BufferPool

     

    There are associated tests for these classes <> 75.

     

    I know this seems like a lackluster post, but the code is worth going through. And if anyone has a good name for this library, let me know.

  • Impedance Mismatch Reframing

     

    This is a reply to Stephen Fortes post Impedance Mismatch from a ways back. I would have posted about it sooner but I sadly just saw it today when a co-worker Stefan Moser linked it over to me. I know that this debate has become quite heated through the community and as such will refrain from personal attacks (such as those unfortunately experienced by Julia Lerman) and focus solely on the technical merits of the post.

     

    My first problem with ORMs in general is that they force you into a "objects first" box. Design your application and then click a button and magically all the data modeling and data access code will work itself out. This is wrong because it makes you very application centric and a lot of times a database model is going to support far more than your application.

     

    Well I wouldn't say that this is a problem with ORMs per se but a problem with some tools. Those who are using Domain Driven Design are certainly not using this methodology, one of the main reasons I like to tell people to use DDD is that they can design their data storage mechanisms in parallel to their domain model seeking an optimal solution to each. In other words we should be embracing the impedance mismatch and doing what is best on both sides. The paragraph then continues with

     

    In addition an SOA environment will also conflict with ORM.

     

    I do not necessarily agree with this in any way shape or form but am happy to leave it left open to "the many definitions of SOA". I think it can quite easily be done if you follow solid command query separation. Udi Dahan gives a nice discussion of this on his blog.

    Later in the article (I am jumping around a bit to keep my own post coherent)

     

    One of the biggest hassles I see with LINQ to SQL is the typical many-to-many problem. If I have a table of Ocean Liners, vessels,  and ports, I’ll typically have a relational linking table to connect the vessels and ports via a sailing. (Can you tell I am working with Ocean Freight at the moment?) The last thing I want at the object layer is three tables! (And then another table to look up the Ocean Liner that operates the vessel.) Unfortunately, this is what most tools give me. Actually I don't even want one table, I want to hook object functionality to underlying stored procedures. I really want a port object with a vessel collection that also contains the ocean liner information.

     

    The author discusses his experiences with Linq2Sql and then applies it to "what most other tools give me", this is an unfortunate fallacy or a lack of research on available tooling. Linq2Sql is not a real "mapper" nor is what the author referring to "mapping", it is simply an Active Record implementation that is not using self-serving objects. This is what happens when mappers stay too close to the relational structure, they suck in terms of domain language and structure.

    If we were however to use a real mapper (let's say the one those notorious mafia guys are using) a quite different scenario would exist; a domain that sounds almost exactly like what is described as being wanted. This paragraph is also key in showing that research has not been done into Domain Driven Design by the author, I would bet that Stephen and Eric could have some really interesting discussions at the Advisory Council as Eric uses this exact problem domain as a naive starting point for examples in about half of his book.

    A more serious problem is shown though in the authors propensity towards a relational bias when domain objects are called "tables". Why would anyone have a domain full of "tables"? These are behavioral objects. Unless this misunderstanding of what a domain model is is corrected the rest of what a domain model is or does will never make any sense.

    A further lack of understanding of Domain Driven Design is shown with the statement of..

     

    ORM is real good for CRUD and real bad at other things.

     

    Again I believe the author has become confused between ORM and Active Record for some reason. I would never under any circumstances recommend someone to use Domain Driven Design for a CRUD app as there are easier ways (like using Active Record). DDD is hard and often painful, it is costly up front and should only be used in domains that can justify its up front costs in maintainability.

     

    Although it may be surprising, it is my belief that the author is actually a Domain Driven Design aficionado but has just not yet realized it yet.

     

    I prefer to build the application's object model and the data model at about the same time, with a "whiteboarding" approach that outlines the flows of data and functionality across the business process and problem set.

     

    It is quite common in an "object first" perspective to be either doing database and code modeling either in small iterations or in parallel where a team of object experts focus on the domain model and the best way to model the data in order to support transactional behaviors while a team of database experts focus on how best to store the data given their own set of requirements. These types of sessions would in fact be prescribed in an agile team and the small "whiteboarding" sessions are absolutely prescribed by Domain Driver Design.

     

    Maybe it is the MBA talking but I tend to be "business and customers first" when I design a system. (Those of you that know me know that I have designed some very large and scalable systems in my day.)

     

    This is one of the core beliefs of Domain Driven Design, the primary example would be the creation of an Ubiquitous Language in order to ease communications between the "business and customers" and the team.

     

    What I am saying (and have been saying for a long time) is that we should accept, no, embrace the impedance mismatch!  While others are saying we should eradicate it, I say embrace it.

     

    Again we are back into agreement with Domain Driven Design. I like to look at Domain Driven Design as being an orthogonal architecture, my domain survives through anything that is moved around it as it is the core of my business and where the largest amount of my investment has gone...

     

     

    We come now to where the author is unfortunately not in line with DDD but perhaps can be moved. The only way that one can reach an orthogonal architecture is to ensure the purity of the domain model. The OLTP RDBMS will eventually leave in popularity, what happens when I want to move to say "the cloud" and just store my aggregate roots as XML, this is a perfectly valid and extremely effective architecture. If I favor too heavily the RDBMS side of the impedance mismatch then this change will not be orthogonal to my domain and will as such be extremely costly. The author may disagree with my reasoning as he points out.

     

    ORM tools should evolve to get closer to the database, not further away.

    and

    Developers who write object oriented and procedural code like C# and Java have trouble learning the set-based mathematics theory that govern the SQL language. Developers are just plain old lazy and don't want to code SQL since it is too "hard." That is why you see bad T-SQL: developers try to solve it their way, not in a set-based way.

    and

    So ORMs are trying to solve the issue of data access in a way that C# and VB developers can understand: objects, procedural, etc.  That is why they are doomed to fail. The further you abstract the developer from thinking in a set-based way and have them write in a procedural way and have the computer (ORM) convert it to a set-based way, the worse we will be off over time.

     

    Well I think I have already discussed the first of these points pretty well, by moving closer to the database we break our hopes of an orthogonal architecture. The second comment albeit sounding like it came from a grand and mighty sql wizard sent down by the gods to lift us heathen from our sinful ways is actually a red herring as is the third when framed properly.

    I do know relational algebra (yes I can tell you what an anti-join is) and I challenge anyone to show me notation for an insert. While one could argue it can be involved with say a delete by PK/FK or update by PK it is for all intensive purposes useless in the process of writing to a properly normalized database, these items tend to be procedural regardless. I will admit there are times where it can come in handy but they are by far the minority. The relational algebra is focused on reading data and manipulating sets.

    As many who have had long post-conference talks over beer with me know I find any query that is of any amount of complexity close to thinking about the relational algebra to be a report. Reports are not expressed within my domain and may or may not be read from the same data source (I often times use an eventually consistent reporting model specifically for the purpose of running such queries). I take this often to extremes, my repositories in an ideal world have a single read method, FetchAggregateByUniqueId. Anything that is searching in a more complex nature is deemed a report and sits outside of this (usually as a small mapper that returns DTOs that match screen shapes, not domain shapes but provide the appropriate aggregate ids for writes to be possible). My "reports" all make very strong use of SQL and Relational Algebra, my domain has no need to know that it exists as it is essentially a write only model. I could go much more into this but it is another post.

    Getting back to the article, the author does however end off with a great quote from Ted Neward:

     

    "Developers [should] simply accept that there is no way to efficiently and easily close the loop on the O/R mismatch, and use an O/R-M to solve 80% (or 50% or 95%, or whatever percentage seems appropriate) of the problem and make use of SQL and relational-based access (such as "raw" JDBC or ADO.NET) to carry them past those areas where an O/R-M would create problems."

     

    This is great advice ... just remember if you do it to hide it from your domain and to use it sparingly as you may not always have a RDBMS sitting behind you and if you don't these set based operations may be quite difficult to implement.

  • Bellware Driven Design

    When I was down in Seattle last week Scott Bellware did a talk about BDD for a few people. Its not a very formal talk (which I prefer) and its a bit slow to start but there are some gems in here. My camera died after the first hour but definitely worth checking out.

     

    Enjoy!

     

    Posted Jul 19 2008, 08:19 PM by Greg with 12 comment(s)
    Filed under:
  • Alt.Net Canada

     

    canada So its actually going to happen! August 15-17 in Calgary. Registration is now open! http://www.altnetconfcanada.com/

     

    Registration is now open to the first hundred people so forget your Canada Day celebrations and sign on up!

     

     

    btw: for those from the states, yes people in Calgary live in igloos, if you have never stayed in an igloo I would highly recommend the experience.

  • DDDD Moved

    After some conversations with Scott and others I have decided that I will be writing up alot more on DDDD (I have already started). I will be releasing it under a creative commons license as opposed to going with a brick and mortar publisher. Over the next few weeks I will begin pushing stuff out to a small group for review. The completed work will be available for download on my blog. There will also be atleast 1 reference application provided.

    My reasoning for releasing under creative commons is I want to get this out to as many people as possible. I may offer one of those "get a printed copy of this" or something but am more so focused on trying to get the ideas to market quickly (my estimate with a brick and mortar publisher was almost 2 years ... I think I can do it quicker otherwise).

    I am looking for a small (<10 group of reviewers) to read through things and provide feedback as they are written over the next few months. Drop me an email offline.

    Also as I am new to this if anyone has suggestions (as an example I was going to setup JIRA for reviewer comments) please let me know, or if you are a professional editor who wants to help my atrocious grammar that would be appreciated too :-)

  • Dynamic Languages vs Static Verification

    At alt.net Seattle as some may remember I was doing a bunch of interviews for infoq.com. On of those quick videos was a talk with Rustan Leino, Mike Barnett, John Lam, and Matt Podwysocki about dynamic languages and static verification. This came from the starting fish bowl on polygot programming. I had to cut it a bit short in terms of time because John had to go but there are some interesting thoughts brought out (in particular the annealing of software over time).  Anyways ... here is the video, enjoy!

     

  • devTeach Talk

    Here is my devTeach talk ... I got a bit of a late start and had a lot of material to try to get in so I had to push away from a few good discussions but I will answer those discussions in a post here ... and be kind I only knew I was speaking 2 weeks in advance ;-)

     

    Enjoy!

     

  • EF Long Term Plans

    I was reading through what is actually a reasonable comparison of EF to other technologies on Dan Simmons' blog.

    Dave, Jeremy, and Jimmy have already discussed many issues but ...

     

    One bit caught my attention:

    Long-term we are working to build EDM awareness into a variety of other Microsoft products so that if you have an Entity Data Model, you should be able to automatically create REST-oriented web services over that model (ADO.Net Data Services aka Astoria), write reports against that model (Reporting Services), synchronize data between a server and an offline client store where the data is moved atomically as entities even if those entities draw from multiple database tables on the server, create workflows from entity-aware building blocks, etc. etc.  Not only does this increase the value of the data model by allowing it to be reused for many parts of your overall solution, but it also allows us to invest more heavily in common tools which will streamline the development process, make developer learning apply to more scenarios, etc.  So the differentiator is not that the EF supports more flexible mapping than nHibernate or something like that, it's that the EF is not just an ORM--it's the first step in a much larger vision of an entity-aware data platform.

     

    DDDD is something very similar to this but I think they have completely missed the boat. I have a single slide in my deck from devTeach that summarizes my objections quite succinctly.

     

    DDDD

     

    I have since rewritten this slide to be more generic in "A single model cannot possibly be appropriate for all facets of your application including transactional behaviors, searching, and reporting"

     

    In DDDD I deal with this by recognizing that the Entity is of limited importance and should be different in different places ... It is what happens to the entity that REALLY matters and it is the recognition and the making explicit of EVENTS in the domain that allows you to easily support multiple concurrent parallel models. These events should not be automatically generated object->field changed messages but should be DOMAIN CONCEPTS.

     

    let me say for the 1000th time. If you are reporting off your transactional model you are seeking trouble!

    On the DDD list people often ask "How do I use my domain to report" ... the answer "You don't" they are different models with different goals. It pains me that MS intends to push people into what is an anti-pattern, even for small systems.

     

     

    Jimmy Bogard was also right on the money when he mentions that I should not expose my model outside of my Bounded Context. I highly doubt a system like EF and what they suggest would work beyond trivial cases and is (as proposed) one small step up from using sprocs and linked servers as your integration model.

     

    I could say MUCH more about this but instead I will try to rework my talk a bit in Victoria Wednesday to try to include some of this.

  • Revenge of the Statically Typed Languages

     

    darth There have been some great posts recently going around about the merits of static vs dynamic languages ...

    Dynamic Languages Strike Back by Steve Yegge

    Return of the Statically Typed Languages by Cedric

     

    I have to admit that I was really waiting for the Java/C# "Attack of the Clones" reference...

     

     

    Which are then summarized and moved towards polygot programming by Ola Bini in A New Hope: Polygotism.

     

     

    Quoting Ola's summary:

    So let's see. Distilled, Steve thinks that static languages have reached the ceiling for what's possible to do, and that dynamic languages offer more flexibility and power without actually sacrificing performance and maintainability. He backs this up with several research papers that point to very interesting runtime performance improvement techniques that really can help dynamic languages perform exceptionally well.


    On the other hand Cedric believes that Scala is bad because of implicits and pattern matching, that it's common sense to not allow people to use the languages they like, that tools for dynamic languages will never be as good as the ones for static ones, that Java generics isn't really a problem, that dynamic language performance will improve but that this doesn't matter, that static languages really hasn't failed at all and that Java is still the best language of choice, and will continue to be for a long time.

     

    On a side note Cedric has a big point on the tooling side of dynamic languages; it will never be there as it can't be logically. Since so many things are dynamic at runtime I will eventually end up running into the halting problem trying to figure out what happens at runtime.

     

    I think however that this entire discussion of static vs dynamic languages is fundamentally flawed as shown later in Ola's post when he describes the debate as:

    So, we have three categories of languages here. The strongly statically checked ones, like Haskell. The weakly statically checked ones, like Java. And the dynamically checked ones, like Ruby.

    It seems to me to be rather silly to have this discussion of static vs dynamic languages without bringing in the concept of static verification of more than just types.

     

    Enter DbC

     

    E. W. Dijkstra arguably one of the most influential computer scientists of our time recognized the importance of static verification and the provability of computer programs. In "The Cruelty of Really Teaching Computer Science" he even went so far as to make a proposal for an introductory programming course for freshmen that consisted of Hoare logic.

     

    There are languages out there that do more than verify just your types. Most of them currently are in the lineage of of Betrand Meyer's Eiffel programming language and can be considered DbC (Design by Contract).

     

    Sure DbC is pretty obscure now but it is a hot area of research due to the ability to use theorem provers on the contracts of the system. These theorem provers are not looking solely at type safety in a statically verifiable way they are also looking at your code and contracts to be sure that they cannot be broken at runtime! This is not a new idea (research in this area goes back to the beginnings of DbC) but the exponential growth of processor and memory resources have made this a much more obtainable goal. There are now two fairly mainstream projects working on this now, the JML (Java Modeling Language) project and the Spec# project from MSR.

     

    Their working is best seen in an example (although this is an extremely naive example, for more in depth examples check out the samples in the spec# install)

     

    using System;
    using Microsoft.Contracts;

    public class Program
    {
      static void Main(string![]! args) {
          Console.Write("Entry? : ");
          string response = Console.ReadLine();
          Console.WriteLine(response.Substring(response.Length - 6, 6));
      }
    }

     

    This code will give 3 compile time errors.

     

    C:\crap\Project1\Project1\Program.ssc(9,22): warning CS2663: Call of string.Substring(int startIndex, int length), unsatisfied precondition: 0 <= startIndex
    C:\crap\Project1\Project1\Program.ssc(9,41): warning CS2663: Possible null dereference
    C:\crap\Project1\Project1\Program.ssc(9,41): warning CS2614: Receiver might be null (of type 'string')

     

    Note that you can treat these as errors as well (and they show up in the ide as normal errors).

     

    These errors result from two basic problems with the code. The first problem is that Console.ReadLine() may return null in which case the Substring call and the Length access would cause problems. The second problem is that the length of the string returned from the ReadLine may be less than 6 in length. By placing some defensive code we can prove to the prover that we understand and are handling those cases.

     

    using System;
    using Microsoft.Contracts;

    public class Program
    {
      static void Main(string![]! args) {
          Console.Write("Entry? : ");
          string response = Console.ReadLine();
          if(response != null && response.Length >=6) {
                Console.WriteLine(response.Substring(response.Length - 6, 6));
          }
      }
    }

     

    Which will give no errors as the spec# compiler parses the code and understands the branching enough to realize that I have now proven that those bad circumstances can no longer happen. The key thing to notice here is that even without unit tests it is finding problems in terms of verification of my code (I said my code runs within these rules and I am breaking my own rules). These tools will REVOLUTIONIZE the development industry.

     

    Note: These contracts do not get rid of things like unit testing or TDD they complement them. There is a difference between verifying (Do I play by my own rules that I set up) vs validating (does this thing do anything of value)

     

    There are many scenarios where this can benefit us but one of the main ones I see is in dealing with invariants that deal with multiple methods and/or objects (these are often very difficult to unit test). A simple example of this can be seen in List<T>, in particular there is an invariant between List<T>.Count and the number of items in a List<T>. These tools will allow you to make it so you can insure mathematically that this invariant is never broken as opposed to sprinkling around state based tests and hoping that you actually covered all of the places where the invariant is being accessed. These problems are obviously magnified when we introduce an invariant that spans multiple objects ...

     

    Another scenario where theorem proving can help us greatly is in our refactoring. The compiler will not let us break our previously distributed contract providing we don't change it. To provide the same level of support with unit tests tends to cost us many unit tests for complex contracts with many preconditions.

     

    Does DbC play well with others?

     

    Coming back to the concept of polygot programming; it is reasonably easy with dynamic and static languages in general as can be illustrated by the existence of the DLR. There is however a much larger disconnect between the world of theorem proving and dynamic languages. Dynamic languages are in their definition runtime defined and static verification is in its definition compile-time defined, the use of a dynamic language makes the concept of statically verifying your code at compile time impossible. To try to verify dynamic code at compile time would likely walk you straight into the halting problem just like it would for many kinds of tooling.

     

    I took some time at alt.net seattle to pull aside Mike Barnett and Rustan Leino from the spec# team and John Lam from the IronRuby team to talk for a few minutes on video about how we can effectively use these ideas together. Many interesting ideas came out of this discussion in particular the recognizing of the merits of dynamic languages for change and the possibility of starting in dynamic languages and annealing software into compile time proven languages. I have submitted this video and am hoping it will be coming out soon!

  • DDDD 12 [How long?]

    One of the most difficult parts of becoming eventually consistent is getting people out of the mind set that everything is always consistent. Years ago before they introduced their system that used to be legacy that everyone now can after over a decade finally laugh about; they probably ran with paper (or maybe they ran with scissors who knows). Paper was awesome because it never gave the impression of global consistency and it is the thinking that went into the optimization of paper processes that can help us optimize our transactional systems.

    Then again getting to that point can be tough so we will focus more on that in the next post ... For now we will look at getting through the barrier of assumed global consistency.

     

    SME: So yeah this <data, let's call it a purchase order> is edited by the user and can be searched by others or viewed in our reports.

    Me: OK, how long of a period is generally acceptable between the time that they enter the data and the time others can see it, is it different for reports vs searches?

    SME: It should be available for all of those things immediately.

    Me: The "purchase orders" sound pretty important, what happens when part of the system goes down? How long of being unavailable would be considered a catastrophic failure?

    SME: Well in our current system when things are down everything is down so I can't say that we have dealt with that but 30 minutes of the current system being down and not being able to enter purchase orders would cost our company $237,123 in lost productivity and possibly lost orders (of course they have spent the time of analysts to calculate this exact number but they have never really tried fixing the problem before). This is why we are rewriting the system.

    Me: Entering purchase orders sounds like it is the life blood of your business; is that correct?

    SME: Of course without purchase orders we have no business!

     

    What has happened here is that we have turned the conversation from everything always being consistent to discussions about the needs of various parts of the system in terms of risk management. Talking about what should happen when things go wrong in the system is a very effective way of turning the discussion away from global consistency as global consistency generally means global failures.

     

    Me: What if you could still enter new purchase orders but you couldn't run your daily sales report? How would that impact your business?

    SME: Well our managers might have to revert to the former manual process of taking all the printed copies of the "Purchase Orders" and adding them up in a spreadsheet. We could probably even deal with this for a few days but not more and it couldn't happen often.

    Me: How timely does the data need to be on the report? If the data is two minutes old but was correct as of two minutes ago how would that affect your business.

    SME (laughing): Awww hell, not at all those lazy managers generally print the document and stop at the water cooler for at least 10 minutes before they get the printed document.

     

    So we have very different architectural requirements for two parts of the system. This is one way of putting things into Bounded Contexts by identifying architectural requirements upon the data. We have realized here that the thing that gets the data for the Daily Summary report should not be in the same Bounded Context that manages purchases orders because they have drastically different requirements. We could have also come to a similar conclusion by using CQS (command/query separation). Generally speaking you will find not one but many reasons to put data into a different Bounded Context or Parallel Model when you only find one reason to do so you should question whether or not it should actually be segregated.

     

    Me: What about the searching for customers to add to Purchase Orders i.e. to lookup Discount information, what if only that one part failed.

    SME: We would still need to be able to look up information like that in order to function.

    Me: Ok, let's talk a bit about how that information gets entered and changed in the process. As an example, I am a new client what happens.

    SME: Well your company would make arrangements between sales and account receivable. We would establish a credit line for you and setup any agreed upon discounts.

    Me: Ok does this generally happen in a few minutes? What is generally the time from accounts receivable getting the person setup with a credit line?

    SME: Well that can vary, usually a while but sometimes right away.

    Me: Right away in a human sense or a computer sense?

    SME: Right away in the sense that after they get setup they may be transferred back to sales to make a purchase.

     

    Where is this going? What sorts of things are we learning about the domain and how things interact?

     

    The introduction of talking about how long things should take in terms of SLAs between contexts/models provides us with a lot of detailed information about the domain even if we do not intend to use it to provide SLAs within our system. As such we as DDD practitioners should be introducing this type of discussion regardless of whether or not we intend to move towards messaging.

  • devTeach: Strategic Domain Driven Design

    Here is Dave Laribee's session on Strategic Domain Driven Design from devTeach.

     

     

     

    I have always admired the simplicity and art of Dave's slides, this time I particularly enjoyed the Mr. Roger's slide!

     

    Of course Dave provides some awesome technical content as well good artistic taste; enjoy!

  • devTeach after thoughts

    Just got home from the airport; what a trip. I flew out Tuesday night on a red eye arriving at about 6:30 am in Toronto (On my way out I listened to the new altdotnet podcast this is great stuff!). I spent all of Wednesday in a barely-able-to-walk-because-I-was-so-tired state popping in and out of various sessions and hanging out with many of the great people at the conference.

    Today I checked out a few sessions including Owen's continuous integration and Dave Laribee's Strategic Domain Driven Design (I took videos of my session and Dave's session and will be placing them up here in the next few days). I really wanted to see two of Oren's talks (DSLs and advanced DI but was unable to make either). I also wanted to see JP's DDD session but it was unfortunately scheduled at the same time as my talk so I was unable to make it. 

    On a side note I would really love it if at one of these conferences we could do a bunch of sessions like this one after another (or a pre/post con tutorial) to jump start people who are not familiar with the subject (drop me a line if something like a pre/post con tutorial would interest you as I am curious on the level of interest). As an example at devTeach we could have come close to this with some small schedule changes as the sessions could almost be seen in a progression of JP->Dave->Me with each session adding to the knowledge learned in the session before. If we were to add in another 2-3 sessions there could be a huge amount of effective knowledge transfer. Maybe I will look at following up on this idea for devTeach Montreal but there are actually 2 of these tracks coming up!

    Unfortunately the two tracks are both in the same week in November :( One being Oredev in Sweden and the other being QCon SF. To be honest I don't know how Eric does it, it sounds insane to me to be presenting in Sweden and in SF within 2 days... You may notice that there is also an alt.net track listed for QCon ... cool stuff! QCon is the best/highest level conference I have ever been to, its will be great to have a good showing from the .NET community.

    As always, the discussions outside of the sessions were great. When you get so many smart people in one place good things are bound to happen!

    JR and crew did an great job (as always) with the conference. Setup was good, especially the ability to find power for my laptop. I heard there were problems with the wireless early in the conference but my experience with it was great.

    Posted May 15 2008, 08:49 PM by Greg with 4 comment(s)
    Filed under:
More Posts Next page »

Our Sponsors