Processing legacy data formats with SAX and JAXB

2010-03-16

In real world applications it is sometimes required to read information from legacy files. With legacy files, I mean comma separated or fixed length data files.

I found it very useful to implement an own SAX XMLReader, that transforms the input stream of the legacy data to a xml representation on the fly.

In my opinion, one of the major advantages is that you can use the usual xml tooling stack like StAX, JAXB, XML Schema Validation and so on to process and validate the data.

Here is a short example:

The input data is a list of countries with their ISO-3166–1 code, capital and phone prefix separated by semicolons.

GER;Germany;Berlin;+0049
ESP;Spain;Madrid;+0034
FRA;France;Paris;+0033
...

First we implement a simple XmlReader that transforms the data into a xml representation.

In this example I‘ll use the following xml format:

<countries>
<country code="GER" title="Germany">
<capital>Berlin</capital>
<phone-prefix>+0049</phone-prefix>
</country>
<country code="SPA" title="Spain">
<capital>Madrid</capital>
<phone-prefix>+0049</phone-prefix>
</country>
...
</countries>

Here is the implementation of the XMLReader that is responsible for the transformation of the legacy data to its XML representation. I‘ve omitted some code here because it‘s just empty implementations of some declared methods of the XMLReader. Let your IDE generate the empty implementations for the missing methods.

The standard Java libraries provides no base implementation of the XMLReader Interface, so a better way would be to create an abstract base implementation of the XMLReader first and inherit from it. For this simple example I use a straight forward way of doing things.

import java.io.*;
import org.xml.sax.*;
import org.xml.sax.helpers.AttributesImpl;

public class SampleXmlReader implements XMLReader {

private static final String URI = "";

private ContentHandler contentHandler;
private ErrorHandler errorHandler;

public ContentHandler getContentHandler() {
return contentHandler;
}

public ErrorHandler getErrorHandler() {
return errorHandler;
}

public void setContentHandler(ContentHandler handler) {
this.contentHandler = handler;
}

public void setErrorHandler(ErrorHandler handler) {
this.errorHandler = handler;
}

public void parse(InputSource input) throws IOException, SAXException {

BufferedReader reader = new BufferedReader(input.getCharacterStream());

contentHandler.startDocument();
contentHandler.startElement(URI, "countries", "", new AttributesImpl());

String line = null;
while( (line = reader.readLine()) != null) {

String fields[] = line.split(";");
if(fields.length < 4)
throw new IOException("illegal format of input source");

AttributesImpl attrs = new AttributesImpl();
attrs.addAttribute(URI, "code", "", "String", fields[0]);
attrs.addAttribute(URI, "title", "", "String", fields[1]);

contentHandler.startElement(URI, "country", "", attrs);

writeElement("capital", fields[2]);
writeElement("phone-prefix", fields[3]);

contentHandler.endElement(URI, "country", "");

}

contentHandler.endElement(URI, "countries", "");
contentHandler.endDocument();

}

public void parse(String systemId) throws IOException, SAXException {
throw new UnsupportedOperationException();
}

private void writeElement(String name, String value) throws SAXException {
contentHandler.startElement(URI, name, "", new AttributesImpl());
contentHandler.characters(value.toCharArray(), 0, value.length());
contentHandler.endElement(URI, name, name);
}

// uninteresting code omitted
}

With this XMLReader implementation we are now able to use all the great libraries from the javax.xml.* packages.

For example:

XML Transformation:

final String s = "GER;Germany;Berlin;+0049\n" +
"ESP;Spain;Madrid;+0034\n" +
"FRA;France;Paris;+0033\n";

InputSource in = new InputSource(new StringReader(s));
SAXSource source = new SAXSource(new SampleXmlReader(), in);

TransformerFactory factory = TransformerFactory.newInstance();
Transformer t = factory.newTransformer();
t.transform(source, new StreamResult(System.out));

JAXB Mappings:

JAXBContext jbx = JAXBContext.newInstance(Countries.class);
Unmarshaller um = jbx.createUnmarshaller();

Countries countries = um.unmarshal(source, Countries.class).getValue();
System.out.println(countries.getCountry());


me

Marco Rico Gomez is a passionate software developer located in Germany who likes to share his thoughts and experiences about software development and technologies with others.


blog comments powered by Disqus