Thoughts and Plans: January 2010

Merely a week had passed from my first post "A Recursive XML Parser in C#" that I was asked to create an XML editor in a Windows Forms application.

My first thoughts were to modify the code I wrote for the parser and get it going in less than an hour. So late on a Friday afternoon, I started working on my ingenious idea. Four hours later I was still struggling. Absolutely outraged and frustrated, I decided to throw away the code change idea and go back to the drawing board so-to-speak. The result of this approach was simply incredible.

Before I explain my new approach, I must emphasis a great lesson in development: If you are struggling to achieve a standard operation (like editing an XML file, or logging, etc), you are most certainly using the wrong tools.

Writing to an XML File

In order to demonstrate this technique, a Windows Forms Application similar to the screen shot below is created.

(Fig 1)

The "Browse" button, displays a file open dialog box that helps the user to navigate his/her computer/network file system for a given xml file.

(Fig 2)

Once the user selects the xml file and clicks the "Open" button, the content of the xml are read and displayed in a Windows Forms DataGrid control as shown in Figure 1 above.
Clicking on the small '+' sign in the left-hand margin will expand the node and reveals the small navigation bar on the right-hand side of the grid title. The "Arrow" button is the "Back" button while the other button to right of the arrow button is help to hide or reveal the parent nodes of the displayed node. Finally you can change any string of any nodes and click the "Save" to persist the changes.

The Program:

The solution is embarrassingly simple. There only 4 steps as follows:

Create a DataSet.
Populate it by calling its ReadXml method passing the XML file path to it.
Set it as the DataSource of a DataGrid control.
To persist the changes to file, get the DataSet of the dataGrid and call its WriteXml method.

The GetXmlDataSet reads the XML file contents into a DataSet and is defined as below:

As mentioned before we simply instantiate a DataSet and call its ReadXml mehtod, passing the name of the XML file.

SaveXmlDataSet does just the opposite, Saving the modified DataSet to the file.

Before diving in to the complete code, I'd like to make two points: Firsltly, I prefer to use FileInfo class as opposed to the FILE class (with its path). This is somewhat better as FileInfo hides the file name/path validation code and separtes it from the GetXmlDataSet code.

Secondly, as you've noticed by now, we are not doing any raw xml manipulation. We are using the technology to do the dirty jobs for us.

The Complete Code:

The code for the main form follows:

Introduction:

In recent weeks I have been asked to write an XML Parser to parse XML files without knowing the XML schema on which the file is based. This is a basic problem and the solution is basically a recursive approach.

I did a tentative google and didn't get anything interesting. So I decided to use brute force (i.e. create my own parser) and 45 minutes later I had the solution. Two weeks ago I was asked for an almost identical solution by a different client. I had to spend another 45 min to reproduce the same thing again. It looks like I will be expected to reproduce the same thing again soon. So I might as well dump the solution here to same myself 45 min and maybe help some other folks that might need to solve the same problem.

The Basic Problem:

You are given an XML file and asked to parse the content and display it in the form of a tree structure. For instance if Fig 1 below is the content of the XML file, the solution should display an output like Fig 2.

Fig 1: The Content of the XML File

Fig 2: The Output

The Solution:

Due to the nature of the problem (i.e. the unknown XML Schema factor) the solution is conveniently a recursive one. The objective is to iterate through all the nodes and extract the node names, the attributes of each node and somehow calculate the number of the tabs prepended to each node name in order to imitate the appearance of a tree structure.

In order to enable the parser to keep track of the information for each node, a class called VisualNode is created as shown in Fig 3. A VisualNode is a lightweight representation of an XML node. It is named "Visual" node simply because this was the user will see on the output screen.

Fig 3: The VisualNode Class

To make life easier for ourselves, we have overrided the ToString() method of this class.

The Parser:

This is the heart of the program. The class constructor accepts the xml fully qualified filename and loads it in an XMLDocument object called _XMLDoc. This object is used in GetXMLNodeList() method to extract the XML node tree.

Fig 4: The Parser Class

The GetVisualNodeList() is where the real work starts. It first declares and instantiates a generic list called visualNodesList(). This is needed in order to help the node parser to keep track of the parsed nodes. We will iterate through this list in order to print out the final output.

Next, we iterate through the available nodes in the node tree. At this stage there maybe 0, 1, 2 or more nodes. If there are 0 nodes or 3 or more nodes or just the XML Declaraion node, then the xml file is not valid. If there is only one node, it must be the root node of the document. If there are 2 nodes, they should be the XML Declaration as well as the root node. If the file is missing or the invalid, an exception will be thrown. If the file is a vaild XML file and has an XML Declaration node, we will ignore that node as it is not required in the output.

Once the root node of the XML file (e.g. people node) is read, it is passed to ParseNode(...) method. ParseNode is a recursive function (i.e. it calls itself) and is called to parse every single XML node in the document. ParseNode takes 3 parameters, the XML Node, the VisualNodeList and the number of tabs. The interesting parameter is the number of tabs (i.e. indent parameter). We just know that for the first node, indent is equal to zero (hence Fig 4 line 22) and for each child the indent should be one more than the parents indent value (hence Fig 4 Line 49). Apart from these 2 facts, we leave it entirely to recursion to handle the number of tabs and it does a good job of it too. (Acutally this is one of the few places that ++indent will not work. You must use indent + 1. This will allow the recursion to control the value of the indent).

The Driver:

The final piece of the puzzle is of course the driver (the main program). As shown in Fig 5, its pretty striaght forward.

Fig 5: The Driver

We have created an object of XMLParse type and used to a get a reference to its VisualNodeList object. Then we iterate through this list and use the overrided ToString() method of the VisualNodes to display each node on the console screen. We have wrapped this in a try-catch block to ensure that file access and xml validity is handled appropriately.

With season's greetings!
01/01/10

Thoughts and Plans

Saturday, January 30, 2010

An XML File Editor in C#

Friday, January 1, 2010

A Recursive XML Parser in C#

About Me

Blog Archive