Thoughts and Plans: 2010

Tuesday, August 31, 2010

Linq.Union Problem

Linq has created a lot of excitement in the developer communities and is a great tool for mixing the UI, Business Logic and the Infrastructure into a seriously entangled mess of unmaintainable code, if of course used improperly. So I am working hard on learning the "proper" way to use it.

Meanwhile, only yesterday I spent quite sometimes googling/binging around to find a way to apply aggregate filters dynamically to a query to no avail. Maybe this is something that I should put down as "not knowing the proper way of using Linq"!

Anyways, here's the use case, judge for yourself:

I have a list of suppliers and their residence States. I'd like to offer the user the ability to pull the list of suppliers who are based in a bunch of states of her own choice. i.e today she like to have the list of suppliers in CO, TX and CA. Tomorrow she may like to have the list of suppliers based in WA and CO.

If I were to create a fix function to get the supplier for CO, WA, TX and CA, the I would write something like this:

This works but it is not "Dynamic". Remember the users list of states changes everday. So I came up with the following solutions:

This is "Dynamic" (iterates through a list of states) but doesn't work!!

Here's the output... the middle states are missing!!

DynamicUnion Result
===Bad Results========
Supplier0 is based in CO
Supplier1 is based in CO
Supplier2 is based in CO
Supplier3 is based in CO
Supplier4 is based in CO
Supplier16 is based in WA
Supplier17 is based in WA
Supplier18 is based in WA
Supplier19 is based in WA
Supplier20 is based in WA

The problem seems to be the following statement:

For whatever reasons, this only remembers the suppliers for the first and the last states in the list. The ones in the middle are lost!! The syntax surely looks decent and "c-sharpish". It compiles too (but don't ship it yet). I am at a loss to explain this strange behavior. I shall tentatively dub it as the Linq.Union.Bug for now.

The work around is rather simple, just persist every set of records in a list as they are pulled from the data source. This should work except that it does not do the Union operation (as in set theory) properly, i.e. it doesn't get rid of repetition and you need to do some cleaning up before you present the list to the user.

Anyways, the whole program follows with the run output.

Run Output

As you will note in the Dynamic Results, TX and CA (the states between the first and last entries in the list) are eaten up by the "Union Bug"!

StaticUnion Result
===Good Results======
Supplier0 is based in CO
Supplier1 is based in CO
Supplier2 is based in CO
Supplier3 is based in CO
Supplier4 is based in CO
Supplier5 is based in TX
Supplier6 is based in TX
Supplier7 is based in TX
Supplier9 is based in TX
Supplier10 is based in TX
Supplier11 is based in CA
Supplier12 is based in CA
Supplier13 is based in CA
Supplier14 is based in CA
Supplier15 is based in CA
Supplier16 is based in WA
Supplier17 is based in WA
Supplier18 is based in WA
Supplier19 is based in WA
Supplier20 is based in WA

DynamicUnion Result
===Bad Results========
Supplier0 is based in CO
Supplier1 is based in CO
Supplier2 is based in CO
Supplier3 is based in CO
Supplier4 is based in CO
Supplier16 is based in WA
Supplier17 is based in WA
Supplier18 is based in WA
Supplier19 is based in WA
Supplier20 is based in WA

Dynamic WorkAround
===Good Results======
Supplier0 is based in CO
Supplier1 is based in CO
Supplier2 is based in CO
Supplier3 is based in CO
Supplier4 is based in CO
Supplier5 is based in TX
Supplier6 is based in TX
Supplier7 is based in TX
Supplier9 is based in TX
Supplier10 is based in TX
Supplier11 is based in CA
Supplier12 is based in CA
Supplier13 is based in CA
Supplier14 is based in CA
Supplier15 is based in CA
Supplier16 is based in WA
Supplier17 is based in WA
Supplier18 is based in WA
Supplier19 is based in WA
Supplier20 is based in WA

Saturday, January 30, 2010

An XML File Editor in C#

Merely a week had passed from my first post "A Recursive XML Parser in C#" that I was asked to create an XML editor in a Windows Forms application.

My first thoughts were to modify the code I wrote for the parser and get it going in less than an hour. So late on a Friday afternoon, I started working on my ingenious idea. Four hours later I was still struggling. Absolutely outraged and frustrated, I decided to throw away the code change idea and go back to the drawing board so-to-speak. The result of this approach was simply incredible.

Before I explain my new approach, I must emphasis a great lesson in development: If you are struggling to achieve a standard operation (like editing an XML file, or logging, etc), you are most certainly using the wrong tools.

Writing to an XML File

In order to demonstrate this technique, a Windows Forms Application similar to the screen shot below is created.

(Fig 1)

The "Browse" button, displays a file open dialog box that helps the user to navigate his/her computer/network file system for a given xml file.

(Fig 2)

Once the user selects the xml file and clicks the "Open" button, the content of the xml are read and displayed in a Windows Forms DataGrid control as shown in Figure 1 above.
Clicking on the small '+' sign in the left-hand margin will expand the node and reveals the small navigation bar on the right-hand side of the grid title. The "Arrow" button is the "Back" button while the other button to right of the arrow button is help to hide or reveal the parent nodes of the displayed node. Finally you can change any string of any nodes and click the "Save" to persist the changes.

The Program:

The solution is embarrassingly simple. There only 4 steps as follows:

Create a DataSet.
Populate it by calling its ReadXml method passing the XML file path to it.
Set it as the DataSource of a DataGrid control.
To persist the changes to file, get the DataSet of the dataGrid and call its WriteXml method.

The GetXmlDataSet reads the XML file contents into a DataSet and is defined as below:

As mentioned before we simply instantiate a DataSet and call its ReadXml mehtod, passing the name of the XML file.

SaveXmlDataSet does just the opposite, Saving the modified DataSet to the file.

Before diving in to the complete code, I'd like to make two points: Firsltly, I prefer to use FileInfo class as opposed to the FILE class (with its path). This is somewhat better as FileInfo hides the file name/path validation code and separtes it from the GetXmlDataSet code.

Secondly, as you've noticed by now, we are not doing any raw xml manipulation. We are using the technology to do the dirty jobs for us.

The Complete Code:

The code for the main form follows:

Friday, January 1, 2010

A Recursive XML Parser in C#

Introduction:

In recent weeks I have been asked to write an XML Parser to parse XML files without knowing the XML schema on which the file is based. This is a basic problem and the solution is basically a recursive approach.

I did a tentative google and didn't get anything interesting. So I decided to use brute force (i.e. create my own parser) and 45 minutes later I had the solution. Two weeks ago I was asked for an almost identical solution by a different client. I had to spend another 45 min to reproduce the same thing again. It looks like I will be expected to reproduce the same thing again soon. So I might as well dump the solution here to same myself 45 min and maybe help some other folks that might need to solve the same problem.

The Basic Problem:

You are given an XML file and asked to parse the content and display it in the form of a tree structure. For instance if Fig 1 below is the content of the XML file, the solution should display an output like Fig 2.

Fig 1: The Content of the XML File

Fig 2: The Output

The Solution:

Due to the nature of the problem (i.e. the unknown XML Schema factor) the solution is conveniently a recursive one. The objective is to iterate through all the nodes and extract the node names, the attributes of each node and somehow calculate the number of the tabs prepended to each node name in order to imitate the appearance of a tree structure.

In order to enable the parser to keep track of the information for each node, a class called VisualNode is created as shown in Fig 3. A VisualNode is a lightweight representation of an XML node. It is named "Visual" node simply because this was the user will see on the output screen.

Fig 3: The VisualNode Class

To make life easier for ourselves, we have overrided the ToString() method of this class.

The Parser:

This is the heart of the program. The class constructor accepts the xml fully qualified filename and loads it in an XMLDocument object called _XMLDoc. This object is used in GetXMLNodeList() method to extract the XML node tree.

Fig 4: The Parser Class

The GetVisualNodeList() is where the real work starts. It first declares and instantiates a generic list called visualNodesList(). This is needed in order to help the node parser to keep track of the parsed nodes. We will iterate through this list in order to print out the final output.

Next, we iterate through the available nodes in the node tree. At this stage there maybe 0, 1, 2 or more nodes. If there are 0 nodes or 3 or more nodes or just the XML Declaraion node, then the xml file is not valid. If there is only one node, it must be the root node of the document. If there are 2 nodes, they should be the XML Declaration as well as the root node. If the file is missing or the invalid, an exception will be thrown. If the file is a vaild XML file and has an XML Declaration node, we will ignore that node as it is not required in the output.

Once the root node of the XML file (e.g. people node) is read, it is passed to ParseNode(...) method. ParseNode is a recursive function (i.e. it calls itself) and is called to parse every single XML node in the document. ParseNode takes 3 parameters, the XML Node, the VisualNodeList and the number of tabs. The interesting parameter is the number of tabs (i.e. indent parameter). We just know that for the first node, indent is equal to zero (hence Fig 4 line 22) and for each child the indent should be one more than the parents indent value (hence Fig 4 Line 49). Apart from these 2 facts, we leave it entirely to recursion to handle the number of tabs and it does a good job of it too. (Acutally this is one of the few places that ++indent will not work. You must use indent + 1. This will allow the recursion to control the value of the indent).

The Driver:

The final piece of the puzzle is of course the driver (the main program). As shown in Fig 5, its pretty striaght forward.

Fig 5: The Driver

We have created an object of XMLParse type and used to a get a reference to its VisualNodeList object. Then we iterate through this list and use the overrided ToString() method of the VisualNodes to display each node on the console screen. We have wrapped this in a try-catch block to ensure that file access and xml validity is handled appropriately.

With season's greetings!
01/01/10

Thoughts and Plans