Logo Icon Logo
A Crowd-sourced Cookbook on Writing Great Android® Apps
GitHub logo Twitter logo OReilly Book Cover Art

Parsing an XML document using an XmlPullParser

Published? false
FormatLanguage: WikiFormat

Problem:

You have data in XML, and you want to transform it into something useful in your application.

Solution:

Apart from processing XML using DOM or SAX the Android framework also provides an implementation of the XmlPullParser interface provided in the XML Pull v1 API.

Discussion:

Contents


Introduction

XmlPull v1 API is a simple to use XML pull parsing API that was designed for simplicity and very good performance both in constrained environment such as defined by J2ME and on server side when used in J2EE application servers. XML pull parsing allows incremental (sometimes called streaming) parsing of XML where application is in control - the parsing can be interrupted at any given moment and resumed when application is ready to consume more input.

Parsing XML with the XmlPullParser

The code below parses the XML document containing the list of Recipes in the Android Cookbook, as discussed in Using a RESTful Web Service and Parsing an XML document using the DOM API. The input file has a single recipes root element, followed by a sequence of recipe elements, each with an id and a title with textual content.

First we get an instance of an XmlPullParserFactory by calling it's static newInstance() method. Basically this scans the classpath for instances of XmlPullParserFactory and XmlPullParser. If it cannot find any instances this method throws an XmlPullParserException. We get an instance of an XmlPullParser by calling the newPullParser() factory method. We then pass the recipe list URL via the setInput(InputStream inputStream, String inputEncoding) method. The call to setInput resets the parser state and sets the event type to the initial value START_DOCUMENT. Also note that we don't need to first retrieve the URL's content with the converse method, like was done in the Using a RESTful Web Service and Parsing an XML document using the DOM API recipes.

Parsing XML input with an XmlPullParser means we are processing parser events. Simple events can be of the following type: START_DOCUMENT, END_DOCUMENT, START_TAG, END_TAG and TEXT. (You might notice that these closely mimic the SAX callback event handler methods). Once we have passed our URL to the setInput() method we are ready for processing these events.

The first event is of type START_DOCUMENT. We process the input until we encounter the END_DOCUMENT tag. We advance to the next event by calling the next() method. (Note: You can even process more events by calling the nextToken() method, but that is out of scope here).

The code simply keeps on advancing to the next event until it encounters a START_TAG. In this case we retrieve the element's local name by calling the getName() method. When namespace processing is disabled, the raw name is returned. We store the tag name in a local variable currentTag, as a bread crumb. (Note: When a start element contains attributes you can extract them via the getAttributeValue(String namespace, String name) method, again out of scope here). Now we simply fall through the loop and advance to the next event.

Once we encounter a TEXT event we check whether the currentTag is "id" or "title". If this is the case we retrieve the text contents by calling the getText() method and assign it to the appropriate local variable. We keep on doing this until we encounter a recipe END_TAG event. In this case we simply create a new Datum object with the previously created id and title variables.

    public static ArrayList<Datum> parse(String url) throws IOException, XmlPullParserException {
        final ArrayList<Datum> results = new ArrayList<Datum>(1000);

        XmlPullParserFactory factory = XmlPullParserFactory.newInstance();
        factory.setNamespaceAware(true);
        XmlPullParser xpp = factory.newPullParser();

        URL input = new URL(url);
        xpp.setInput(input.openStream(), null);
		
        int eventType = xpp.getEventType();
        String currentTag = null;
        Integer id = null;
        String title = null;
        while (eventType != XmlPullParser.END_DOCUMENT) {
            if (eventType == XmlPullParser.START_TAG) {
                currentTag = xpp.getName();
            } else if (eventType == XmlPullParser.TEXT) {
                if ("id".equals(currentTag)) {
                    id = Integer.valueOf(xpp.getText());
                }
                if ("title".equals(currentTag)) {
                    title = xpp.getText();
                }
            } else if (eventType == XmlPullParser.END_TAG) {
                if ("recipe".equals(xpp.getName())) {
                    results.add(new Datum(id, title));
                }
            }
            eventType = xpp.next();
        }
        return results;
    }

Making it more strict

We can rewrite the parse method to make it a bit more strict. In this case we use the require() method to verify the expected XML structure. Once we are on the id or title START_TAG event we call nextText() to retrieve the elements text content and advance to the END_TAG event immediately after.

    public static ArrayList<Datum> parse(String url) throws IOException, XmlPullParserException {
        final ArrayList<Datum> results = new ArrayList<Datum>(1000);

        XmlPullParserFactory factory = XmlPullParserFactory.newInstance();
        factory.setNamespaceAware(true);
        XmlPullParser xpp = factory.newPullParser();

        URL input = new URL(url);
        xpp.setInput(input.openStream(), null);
		
        xpp.nextTag();
        xpp.require(XmlPullParser.START_TAG, null, "recipes");
        while (xpp.nextTag() == XmlPullParser.START_TAG) {
            xpp.require(XmlPullParser.START_TAG, null, "recipe");

            xpp.nextTag();
            xpp.require(XmlPullParser.START_TAG, null, "id");
            Integer id = Integer.valueOf(xpp.nextText());
            xpp.require(XmlPullParser.END_TAG, null, "id");

            xpp.nextTag();
            xpp.require(XmlPullParser.START_TAG, null, "title");
            String title = xpp.nextText();
            xpp.require(XmlPullParser.END_TAG, null, "title");

            xpp.nextTag();
            xpp.require(XmlPullParser.END_TAG, null, "recipe");

            results.add(new Datum(id, title));
        }
        xpp.require(XmlPullParser.END_TAG, null, "recipes");

        return results;
    }

Both methods return the same results. The recipe's downloadable source code uses the retrieved list of Datum objects to fill a ListActivity. When you click on a list item your are redirected to the corresponding recipe's web page.

Processing static XML resources

You can easily process static XML resources with an XmlPullParser. Simply call the getXml() method via your context's getResources() method and you will receive an instance of XmlResourceParser. This basically is an implementation of XmlPullParser with an extra convenience method to close the input resource, so you can use the above described techniques to process your static XML resources as well!

Conclusion

The XmlPullParser is the parser of choice for many developers basically because of its simplicity. If you want speed you should pick SAX. DOM is about twice as slow as SAX. Parsing XML with the XmlPullParser is somewhere in the middle between SAX and DOM.

Note

Don't forget to add the android.permission.INTERNET Permission to your AndroidManifest.xml or you will not be able to access any web connections.

See Also:

Using a RESTful Web Service Parsing an XML document using the DOM API Using AsyncTask to do background processing http://developer.android.com/reference/org/xmlpull/v1/XmlPullParser.html http://developer.android.com/reference/org/xmlpull/v1/XmlPullParserFactory.html http://developer.android.com/reference/android/content/res/XmlResourceParser.html

Download:

The source code for this project can be downloaded from https://github.com/downloads/jpelgrim/androidcookbook/RecipeList.zip.