Friday, August 22, 2008

Some sample templates for use with LexEv

If your XML has been parsed using LexEv, here are some sample templates for handling the LexEv markup.

To output an entity reference:



<xsl:template match="lexev:entity">
<xsl:value-of disable-output-escaping="yes" select="concat('&amp;', @name, ';')"/>
</xsl:template>



To process a CDATA section as markup:


<xsl:template match="lexev:cdata">
<xsl:apply-templates/>
</xsl:template>


To output a DOCTYPE from the processing instructions:

In XSLT 1.0 the doctype-public and doctype-system attributes on xsl:output are static and need to be known at compile time, which means I'm afraid you have to do this:


<xsl:template match="/">
<xsl:value-of disable-output-escaping="yes"
select="concat('&lt;!DOCTYPE ', name(/*), '&#xa; PUBLIC &quot;',
processing-instruction('doctype-public'), '&quot; &quot;',
processing-instruction('doctype-system'), '&quot;&gt;')"/>
<xsl:apply-templates/>
</xsl:template>


In XSLT 2.0 you can use xsl:result-document where the doctype-public and doctype-system are AVTs which mean their values can be determined at runtime:


<xsl:template match="/">
<xsl:result-document
doctype-public="{processing-instruction('doctype-public')}"
doctype-system="{processing-instruction('doctype-system')}">
<xsl:apply-templates/>
</xsl:result-document>
</xsl:template>

Thursday, August 21, 2008

LexEv XMLReader - converts lexical events into markup

It's often a requirement to preserve entity references through to the output (which are usually lost during parsing) or to process the contents of CDATA sections as markup. The Lexical Event XMLReader wraps the standard XMLReader to convert lexical events into markup so that they can be processed. Typical uses are:

  • Converting cdata sections into markup:


    <![CDATA[ &lt;p&gt; a para &lt;p&gt; ]]>

    to:

    <lexev:cdata> <p> a para </p> </lexev:cdata>



  • Preserving entity references:


    hello&mdash;world

    is converted to:

    hello<lexev:entity name="mdash">—</lexev:entity>world


  • Preserving the doctype declaration:


    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

    is converted to processing instructions:

    <?doctype-public -//W3C//DTD XHTML 1.0 Transitional//EN?>
    <?doctype-system http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd?>


  • Marking up comments:


    <!-- a comment -->

    is converted to:

    <lexev:comment> a comment </lexev:comment>


To use LexEvXMLReader with Saxon:


java -cp saxon9.jar;LexEvXMLReader.jar net.sf.saxon.Transform -x:com.andrewjwelch.lexev.LexEvXMLReader input.xml
stylesheet.xslt


Make sure LexEvXMLReader.jar is on the classpath, and then tell Saxon to use it with the -x switch (copy and paste this line -x:com.andrewjwelch.lexev.LexEvXMLReader)


To use LexEvXMLReader from Java:

XMLReader xmlReader = new LexEvXMLReader();


You can control the following features of LexEv:


  • enable/disable the marking up of entity references

  • enable/disable the marking up of CDATA sections

  • set the default namespace for the CDATA section markup

  • enable/disable the reporting of the DOCTYPE

  • enable/disable the marking up of comments


You can set these through the API (if you are including LexEv in an application), or from the command line using the following system properties:


  • com.andrewjwelch.lexev.inline-entities

  • com.andrewjwelch.lexev.cdata

  • com.andrewjwelch.doctype.cdataNamespace

  • com.andrewjwelch.lexev.doctype

  • com.andrewjwelch.lexev.comments


For example to set a system property from the command line you would use: -Dcom.andrewjwelch.lexev.comments=false


For support, suggestions and licensing, email lexev@andrewjwelch.com