Thoughts on XSLT: January 2008

Thursday, January 31, 2008

Kernow 1.6 beta

I've uploaded a new version of Kernow (1.6) which contains the rather nice "XSLT Sandbox" tab. This tab has the XML pane on the left, the XSLT pane on the right and a transform button... and that's it. It's intended for anyone who wants to quickly try something out without the hassle of files, the command line or starting up a proper IDE. It does error checking as you type and highlights any problems.

It's available as the usual download from Sourceforge, or through Java Web Start. If you already run the JWS version it should automatically update itself (any problems just re-install it). I've finally figured out the temperamental errors with the JWS version - it turns out the ant jars included with Kernow were already signed by a previous version and so weren't being signed again, but because they were marked as "lazy" in the jnlp the JWS version would start anyway. (You can tell if a jar has been signed by looking for *.SF and *.DSA in the META-INF directory.)

The other improvement I'm pleased to have sorted out is that kernow.config (where all of the settings and combobox history are saved) is now stored in a directory called .kernow in your user.home (which is one up from My Documents in XP). Previously it would've been stored on the deskop for the JWS version which is really annoying - sorry about that. As usual it was a 10 minute job, but just took a while to get around to.

I've also separated out the SOAP and eXists extension functions into a separate package, so there's no longer the need for the largish eXist.jar, xmldb.jar and log4j.jar jars to be part of the download.

I'll release a non-beta version in a few weeks if no bugs are reported, and I've got around to updating all of the documentation.

Parsing XML into Java 5 Enums

Often when parsing XML into pojos I just resort to writing my own SAX based parser. It can be long winded but I think gives you the greatest flexibility and control over how you get from the XML to objects you can process.

One example is with Java 5's Enums, which are great. Given the kind of XML fragment where currencies are represented using internal codes:

<currency refid="001"/> <!-- 001 is Sterling -->
<currency refid="002"/> <!-- 002 is Euros -->
<currency refid="003"/> <!-- 003 is United States Dollars -->

You can represent each currency element with an Enum, which contains extra fields for the additional information:

public enum Currency {

    GBP ("001", "GBP", "Sterling"),
    USD ("002", "EUR", "Euros"),
    USD ("003", "USD", "United States Dollar");    

    private final String refId;
    private final String code;
    private final String desc;

    Currency(String refId, String code, String desc) {
        this.refId = refId;
        this.code = code;
        this.desc = desc;
    }

    public String refId() {
        return refId;
    }

    public String code() {
        return code;
    }

    public String desc() {
        return desc;
    }

    // Returns the enum based on it's property rather than its name
    // (This loop could possibly be replaced with a static map, but be aware
    //  that static member variables are initialized *after* the enum and therefore
    //  aren't available to the constructor, so you'd need a static block.  
    public static Currency getTypeByRefId(String refId) {
        for (Currency type : Currency.values()) {
            if (type.refId().equals(refId)) {
                return type;
            }
        }

        throw new IllegalArgumentException("Don't have enum for: " + refId);
    }    
}

Notice how each enum calls its own contructor with the 3 parameters - the refId, the code, and the description.

You parse the XML into the enum by calling Currency.getTypeByRefId(String refId) passing in the @refid from the XML. The benefit of using the Enum is that you can then do things like:

if (currency.equals(Currency.GBP))

which is nice and clear, while at the same time being able to call currency.refId() and currency.desc() to get to the other values.

The drawback is that because static member variables are initialized after the enum, you can't create a HashMap and fill it for a faster lookup later (unless you use a static block). Instead you have to loop through all known values() for the enum given a refId. Although it feels wrong to loop, the worst case is only the size the of enum so I don't think it's too bad.

Tuesday, January 22, 2008

Portability of a stylesheet across schema-aware and non-schema-aware processors

I came across this today, which I thought was really cool and worth a post. It basically allows you to code a transform that is only schema-aware if a schema-aware processor is running it, otherwise it's just a standard transform.

In this case I want to do input and output validation, so first I sort out the schemas:

<xsl:import-schema schema-location="input.xsd"
    namespace="http://www.foo.com"
    use-when="system-property('xsl:is-schema-aware')='yes'"/>

<xsl:import-schema schema-location="output.xsd"
    use-when="system-property('xsl:is-schema-aware')='yes'"/>

Note the use-when...

Next define two root matching templates, one for schema-aware, one for basic:

<xsl:template match="/"
    use-when="system-property('xsl:is-schema-aware')='yes'"
    priority="2">

    <xsl:variable name="input" as="document-node()">
        <xsl:document validation="strict">
            <xsl:copy-of select="/"/>
        </xsl:document>
    </xsl:variable>

    <xsl:result-document validation="strict">
        <xsl:apply-templates select="$input/the-root-elem"/>
    </xsl:result-document>

</xsl:template>

<xsl:template match="/">
    <xsl:apply-templates select="the-root-elem"/>
</xsl:template>

<xsl:template match="the-root-elem">
    ...
</xsl:template>

The root matching template for schema-aware processing uses xsl:document to validate the input, and xsl:result-document to validate the output. Validation can also be controlled from outside the transform, but this way forces it on.

I think this is great :)

The identity transform for XSLT 2.0

I was looking at the standard identity transform the other day and realised that for nodes other than elements, the call to apply-templates is redundant.

<xsl:template match="@*|node()">
  <xsl:copy>
    <xsl:apply-templates select="@*|node()"/>
  </xsl:copy>
</xsl:template>

Also, although it might be intuitive to think that attributes have separate nodes for their name and value, they are in fact a single node that's copied in it's entirety by xsl:copy.

I raised this on xsl-list and suggested seperating out the attribute into a template of its own with just xsl:copy for its body:

<xsl:template match="node()">
  <xsl:copy>
    <xsl:apply-templates select="@*|node()"/>
  </xsl:copy>
</xsl:template>

<xsl:template match="@*">
  <xsl:copy/>
</xsl:template>

Mike Kay suggested a more logical version would be:

<xsl:template match="element()">
  <xsl:copy>
    <xsl:apply-templates select="@*,node()"/>
   </xsl:copy>
</xsl:template>

<xsl:template match="attribute()|text()|comment()|processing-instruction()">
  <xsl:copy/>
</xsl:template>

This turned out to be ideal for three reasons:

- the comma between @* and node() will mean the selected nodes will be processed in that order, removing the sorting and deduplication that takes place with union |
- apply-templates is only called when it will have an effect
- it's clearer that attributes are leaf nodes

So there it is... the identity transform for XSLT 2.0