XML (.NET 1.1) Performance Guidelines - Design Considerations

From Guidance Share

Jump to: navigation, search

- J.D. Meier, Srinath Vasireddy, Ashish Babbar, and Alex Mackman


Contents

Choose the Appropriate XML Class for the Job

To help you choose the appropriate .NET Framework class to process XML, consider the following guidelines:

  • Use XmlTextReader to process XML data quickly in a forward, read-only manner without using validation, XPath, and XSLT services.
  • Use XmlValidatingReader to process and to validate XML data. Process and validate the XML data in a forward, read-only manner according to an XML schema or a DTD.
  • Use XPathNavigator to obtain read-only, random access to XML data and to use XPath queries. To create an XPathNavigator object over an XML document, you must call the XPathDocument.CreateNavigator method.
  • Use XmlTextWriter to write XML documents. You cannot use XmlTextWriter to update XML documents.
  • Use the XmlTextReader and XmlTextWriter, in combination, for simple transformations rather than resorting to loading an XmlDocument or using XSLT. For example, updating all the price element values in a document can be achieved by reading with the XmlTextReader, updating the value and then writing to the XmlTextWriter, typically by using the WriteNode method.
  • Use the XmlDocument class to update existing XML documents, or to perform XPath queries and updates in combination. To use XPath queries on the XmlDocument, use the Select method.
  • If possible, use client-side XML processing to improve performance and to reduce bandwidth.


Consider Validating Large Documents

When you use the XmlDocument class to load a large document that contains errors because the format is not correct, you waste memory and CPU resources. Consider validating the input XML if there is a reasonable chance that the XML is invalid. In a closed environment, you might consider validation an unnecessary overhead, but the decision to use or not use validation is a design decision you need to consider.

You can perform the validation process and other operations at the same time because the validation class derives from XmlReader. For example, you can use XmlValidatingReader with XmlSerializer to deserialize and validate XML at the same time. The following code fragment shows how to use XmlValidatingReader.


  // payload is the Xml data
  StringReader stringReader = new StringReader( payload );
  XmlReader xmlReader = new XmlTextReader( stringReader );
  XmlValidatingReader vreader = new XmlValidatingReader( xmlReader );
  vreader.Schemas.Add(XmlSchema.Read(
                        new XmlTextReader("xyz.xsd"), null ) );
  vreader.ValidationType = ValidationType.Schema;
  vreader.ValidationEventHandler += new ValidationEventHandler(ValidationCallBack);

You can also use a validating read with XmlDocument by passing the validating reader instance to the XmlDocument.Load method, as shown in the following code fragment.


  XmlDocument doc = new XmlDocument();
  doc.Load( xmlValidatingReaderInstance );

Validation comes at a performance cost and there is a tradeoff here between validating the XML documents early to catch invalid content as opposed to the additional processing time that validation takes even in a streaming scenario. Typically, using the XmlValidatingReader to validate an XML document is two to three times slower than using the XmlTextReader without validation and deciding on whether to perform validation depends on your particular application scenario.

Process Large Documents in Chunks If Possible

If you have very large XML documents to process, evaluate whether you can divide the documents and then process them in chunks. Dividing the documents makes processing them, by using XLST, more efficient.


Use Streaming Interfaces

Streaming interfaces, like the one provided by XmlTextReader, give better performance and scalability, compared to loading large XML documents into the XmlDocument or XPathDocument classes and then using DOM manipulation.

The DOM creates an in-memory representation of the entire XML document. The XmlTextReader is different from the DOM because XmlTextReader only loads 4-kilobyte (KB) buffers into memory. If you use the DOM to process large XML files, you can typically consume memory equivalent to three or four times the XML document size on disk.


Consider Hard-Coded Transformations

Using XSLT may be overly complicated for certain simple transformations such as changing a particular attribute value, replacing one node with another node, or appending or removing nodes from a document.

If using XSLT appears to be an overly-complicated approach for a simple transformation, you can use XmlReader and XmlWriter together to copy the document from XmlReader to XmlWriter and then modify the document while copying. The XmlWriter.WriteNode and XmlWriter.WriteAttributes methods receive an XmlReader instance, and the method copies the node and its child nodes to XmlWriter.

The disadvantage of using the classes to perform the transformation is that you can modify XSLT without having to recompile the code. However, in some situations, it might be better to hard code a transformation. A simple example of a hard-coded approach is shown in the following code fragment.


  while( reader.Read() )
  {
    if( reader.LocalName == "somethingToChange" )
    {
      writer.WriteStartElement( "somethingChanged" );
      writer.WriteAttributes( reader, false );
      //
    }
    else
    {
      writer.WriteNode( reader, false );
    }
  }


Consider Element and Attribute Name Lengths

Consider the length of the element names and the length of the attribute names that you use. These names are included as metadata in your XML documents. Therefore, the length of an element or attribute name affects the document size. You need to balance size issues with ease of human interpretation and future maintenance. Try to use names that are short and meaningful.


Consider Sharing the XmlNameTable

Share the XmlNameTable class that is used to store element and attribute names across multiple XML documents of the same type to improve performance.

XML classes like XmlTextReader and XmlDocument use the XmlNameTable class to store elements and attribute names. When elements, attributes, or prefixes occur multiple times in the document, they are stored only once in the XmlNameTable and an atomized string is returned. When an element, attribute, or prefix is looked up, an object comparison of the strings is performed instead of a more expensive string operation.

The following code shows how to obtain access to and store the XmlNameTable object.

  System.Xml.XmlTextReader reader = new System.Xml.XmlTextReader("small.xml");
  System.Xml.XmlNameTable nt = reader.NameTable;
  // Store XmlNameTable in Application scope and reuse it
  System.Xml.XmlTextReader reader2 = new System.Xml.XmlTextReader("Test.xml", nt);


References

For more information about object comparisons, see MSDN article, "Object Comparison Using XmlNameTable," at http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpguide/html/cpconobjectcomparisonusingxmlnametable.asp.

Personal tools