Parsing chunks of XML documents with JAXB

2010-02-09

Jaxb is a great java library for mapping xml documents to Java objects and vice versa. But how can Jaxb be used to parse large xml documents?
The Unofficial JAXB Guide contains a small section which provides some useful information about this topic.

Assume we have a xml document similar to the following:

<Example id="10" date="1970-01-01" version="1.0">
<Properties>...</Properties>
<Summary>...</Summary>
<Document id="1">...</Document>
<Document id="2">...</Document>
<Document id="3">...</Document>
</Example>

Now I want to unmarshall the Example element into the corresponding Example object. If I do so the whole xml document gets unmarshalled. If the xml document contains hundreds of thousands of Document elements it will consume a huge amount of memory. But at a certain point I’m only interested in the Example element with its Properties and Summary element. The Document elements can be parsed by chunks.

To reach that goal I use virtual infosets like stated in the JAXB Guide. Therefore I created a simple class named ParitalXmlEventReader which is of type XmlEventReader and delegates all method calls to a parent reader. As a constructor argument it takes a QName of an element. If the reader finds the first start element of that type it closes the parent element by returning the EndElement event. So the xml document above will look like that to the caller of the reader:

<Example id="10" date="1970-01-01" version="1.0">
<Properties>...</Properties>
<Summary>...</Summary>
</Example>

As the parent reader is still located at the first Document start element we can use the same reader to parse the document elements

The following code demonstrates the use of the PartialXmlEventReader:

@Test
public void testChunks() throws JAXBException, XMLStreamException {

final QName qName = new QName("Document");

InputStream in = getClass().getResourceAsStream("example.xml");
if(in == null)
throw new NullPointerException();

// create xml event reader for input stream
XMLInputFactory xif = XMLInputFactory.newInstance();
XMLEventReader reader = xif.createXMLEventReader(in);

// initialize jaxb
JAXBContext jaxbCtx = JAXBContext.newInstance(Example.class, Document.class);
Unmarshaller um = jaxbCtx.createUnmarshaller();

// unmarshall the Example element without parsing the document elements
Example example = um.unmarshal(new PartialXmlEventReader(reader, qName),
Example.class).getValue();

assertNotNull(example);
assertEquals("My Properties", example.getProperties());
assertEquals("My Summary", example.getSummary());
assertNull(example.getDocument());

Long docId = 0l;
XMLEvent e = null;

// loop though the xml stream
while( (e = reader.peek()) != null ) {

// check the event is a Document start element
if(e.isStartElement() && ((StartElement)e).getName().equals(qName)) {

// unmarshall the document
Document document = um.unmarshal(reader, Document.class).getValue();

assertNotNull(document);
assertEquals(++docId, document.getId());

} else {
reader.next();
}

}

assertEquals(new Long(10), docId);
}

You can find the source code of the PartialXmlEventReader here .


me

Marco Rico Gomez is a passionate software developer located in Germany who likes to share his thoughts and experiences about software development and technologies with others.


blog comments powered by Disqus