May 27, 2002


I recently started working for an open source project called Xbeans and there has been a debate on whether to use DOM or JDOM to process XML. I was asked to elaborate on the performance hit and this article is the result of my research.

Let me start off by giving a few differences and benefits of both DOM and JDOM. With DOM we have a standardized platform- and language-neutral interface for dynamically accessing and editing XML. If you noticed, the word "serializing" is not a part of the definition because the implementation was left out of the Level 2 spec. Therefore, DOM parsers provide implementation-specific classes and features for serializing which poses a problem to interoperability. However, the upcoming Level 3 spec, which is designed to extend and not replace Level 2, has an exciting list of new features. The most important feature being the Load and Save package which defines a standard way to build a DOM tree from various input sources and provides an API for serializing a DOM document out as XML. Traversing with DOM can require a great deal of attention and is also error-prone. DOM Level 3 will incorporate the ability to find nodes and traverse through a document using XPath, which will supply easier handling of XML documents. In addition, DOM Level 3 will add asynchronous loading with accompanying load and progress events, filtering load and save, and incremental/concurrent parsing. This will provide the ability to use a partial DOM document while it's still being parsed. I covered most of the spec that deals with enhancing performance but there are more features.

Another important note one should mention is how JAXP uses DOM, SAX, and XSL to provide you with the ability to plug in any compliant parser or processor. So with the upcoming features and the already widespread use of DOM, why should you even think about using JDOM? JDOM is also building a loyal following and promises great features as well. It is currently in beta but is a JSR and will probably be included in an upcoming version of J2SE. This simple fact should encourage a developer to glance at it. However, JDOM is so easy to use and its style fits into Java so well that your glance might lead you to stare a while. It is important to mention that JDOM is not a parser but rather uses SAX or DOM to build the JDOM document. Using SAX to build the representation is encouraged because of its speed and using DOM is encouraged for instances when you already have a DOM tree available. From the JDOM website:

"JDOM documents can be built from XML files, DOM trees, SAX events, or any other source. JDOM documents can be converted to XML files, DOM trees, SAX events, or any other destination. This ability proves useful, for example, when integrating with a program that expects SAX events. JDOM can parse an XML file, let the programmer easily and efficiently manipulate the document, then fire SAX events to the second program directly - no conversion to a serialized format is necessary."

JDOM is very robust and includes checking for well-formedness, XPath support using jaxen, XSLT support via TRaX, and will eventually have in-memory validation. It uses Java collections and behaves like Java. Still not convinced? Neither am I. DOM allows you to move between multiple implementations to suit your needs and also between different languages without having to learn an entirely new XML API. JAXP works on top of a DOM implementation and an XSL processor that you choose and promises future support for all DOM Level 3 additions.

JDOM has proven to provide more features sooner, while still in beta, and has a lot of potential to expand and improve even more in terms of features and performance. It already has a standard way to serialize XML, jumping ahead of DOM in that regard. In early benchmarks, JDOM has performed poorly and, along with its lack of standardization, developers are hesitant to use it. However, most tests have been done with JDOM beta 7 and the current release beta 8 promises many speed optimizations and functionality improvements.

Developers tend not to rely on promises but instead tend to rely on results from performance tests. In a world of various implementations, developers seek a solution with a balance of intuitiveness and performance. DOM wins for this balance and JDOM comes second for sheer intuitiveness. The JDOM gang needs to address the performance issue with some kind of results and not the following lame answer they give on their FAQ page when asked, "Are there any performance numbers?":

"Not yet. JDOM is still really new! Only preliminary numbers have been taken, but they have been extremely promising. And JDOM hasn't been tuned yet!"

Two years is not "really new" and why not post these "extremely promising" numbers for developers to see? I'll tell you why—JDOM didn't beat DOM on performance tests. This can be easily concluded from their posted excuse stating that "JDOM hasn't been tuned yet". Luckily, there have been some third-party performance tests conducted that compare various document object models. Two of which can be seen on IBM's and Sun's sites. Coincidentally, the articles on both sites work as a series in which they introduce various object models, show some benchmark results, and follow with some performance tips. Although some of the models have released newer versions since the publication of these articles, I still recommend you read them and familiarize yourself with the options you have.

Here are the links:

The implementations tested scored differently on various tests so it is difficult to pick a clear winner. We won't really be able to conclude which implementation is best until JDOM 1.0 is released and DOM Level 3 support can be tested. Xerces has a partial implementation of DOM Level 3 which is considered experimental until DOM Level 3 moves from a Working Draft to a Recommendation.

Surprisingly, the tests show that DOM and JDOM have a possible competitor in dom4j. But to save you a further headache, you should only consider JDOM and DOM because DOM is already supported in J2SE 1.4 and JDOM has potential
to be part of the J2SE core in the future (JDOM is in the JCP). With JDOM, most of the performance complaints have been related to the frequent creation of Iterators and Java objects. JDOM promises support for JAXP but will most likely be used instead of JAXP. You can consider JDOM and JAXP competing technologies in a way. They are both pluggable with a compliant parser of your choice. Regardless of which implementation you choose, JDOM is really nice if you are only working in Java and because it can read from existing SAX/DOM/XML sources, and can output to SAX/DOM/XML-receiving components. This ability enables JDOM to be a good choice as an intermediary to connect existing program components that are built against SAX, DOM, or XML.

There are probably two other issues that one should consider when considering an implementation—XML data binding and results showing that (re)parsing the textual version of the XML document causes less overhead than the (de)serialization of DOM trees. However, many processors expect a DOM tree as input and therefore if you send a DOM tree it can work with it immediately—no conversion necessary. So even if you use JDOM you will have to output to DOM or XML because most processors do not have support for JDOM input. With JDOM, it is very easy to accept DOM input using the DOMBuilder class and easy to send DOM as output using the DOMOutputter class, but if you are working primarily in DOM and processors expect DOM then this rules out passing around JDOM documents. Although passing around a DOM tree requires more memory resources, this method is highly effective for post-processing such as using XSL for transformations. This is similar to the XBean approach as a DOM tree is fed to a translator bean.

The fact that processors expect DOM trees and XBeans pass around DOM trees makes the current Xbeans model one of the most efficient ways to pass around data between components. If a developer wanted to use JDOM, then they would need to use the DOMBuilder class to create a JDOM Document, perform the necessary changes, then use the DOMOutputter class to send DOM back out to the next XBean in the channel. This would obviously be a performance hit caused by the conversion which depends on the size of the document both coming in and going out. Also, because there is no standard way to serialize a DOM tree and since JDOM uses a DOM parser to build its representation, this means JDOM depends on a DOM parser with implementation-specific serialization classes to output the DOM tree which means you MUST use the same DOM parser in the previous channel, to build the JDOM Document, and in the receiving channel—another reason to think again about using JDOM within an XBean.

We are seeing a lot of inter-application passing around of DOM trees because developers can't wait to get the XML syntax out of the way. This is fine as long you have a model similar to the XBean model that uses the same implementation of DOM on both ends of the channel. Remember, there is currently no standard way to serialize DOM which might limit interoperability between different DOM parsers. The fact that passing around XML instead of DOM is faster is strange but expect that to change as DOM Level 3 promises to define a standard way to, not only serialize, but provide a way to filter serialization and a way to concurrently parse and use documents. The argument on whether DOM, JDOM, or XML is the fastest internal data representation for an application will not end until we see some more tests. Early tests and suggestions by various articles throughout the Web state that XML is the fastest and you also keep the advantage of an interoperable XML textual representation. Once you decide on a non-XML representation, your project no longer becomes loosely coupled. If you choose XML as the representation of choice, then you can use SAX, DOM, JDOM, dom4j, or any other implementation to work with the XML and, as long as you output XML, then you have no problems with component interoperability.

One of the issues I mentioned that one should consider when choosing an object model is XML data binding. XML data binding is not a document object model but rather an alternative. As the name implies, XML data binding focuses more on the actual data contained within the XML document as opposed to the document itself. It would be nice if an XML-based application had a standard way to work with XML as Java objects. This is exactly what you get with XML data binding—a schema compiler and a runtime framework to support mapping between XML documents and Java objects and vice versa. The schema compiler will have future support for XML Schema but currently it translates XML DTDs into one or more Java classes without requiring the developer to write complex parsing code. The generated code performs error and validity checking of incoming and outgoing XML documents which ensures that only valid messages are handled. There a lot of advantages to data binding according to your Java and XML scenario. The Java Architecture for XML Binding (JAXB) will provide a standard interface but in the meantime, Castor is a popular mapping framework between Java objects, XML documents, SQL & OQL databases and LDAP directories, as well as Java Data Objects (JDO), a specification from Sun dealing with Java-to-RDBMS (relational database management system) persistence. There is also the Zeus framework from the nice folks at Enhydra. XML data binding is a whole other topic so here are some links to get you started if you are interested:

In conclusion, when choosing to work with XML, the next step is to find the best fit for how to deal with the XML. Whether it's SAX, DOM, JDOM, or XML data binding, developers have to understand the pros and cons of each implementation as well as which method is optimal for a project's requirements. If you are overwhelmed with which implementation to choose, just pick JAXP and let it take care of everything for you. It is a Java specification, it attempts to plug the holes left by other implementations, and it is still evolving so you can expect it to meet all your demands.

Here are a few more links that might be of interest:

The author of the IBM articles on document object models and XML data binding provides the XML benchmark source so that you can run the tests yourself as well as other resources:

  • Java XML Models Benchmark

XML Stream (XMLS) - a format designed to eliminate most of the padding of XML text documents, allowing faster input and output of XML documents.

Posted by Nasseam Elkarra at May 27, 2002 01:39 AM