Thursday, November 09, 2006

Using collection() and saxon:discard-document() to create reports

You can process directories of XML using the collection() function, and keep memory usage constant by using the Saxon extension saxon:discard-document()


<xsl:for-each select="for $x in collection('file:///c:/xmlDir?select=*.xml;recurse=yes;on-error=ignore') return saxon:discard-document($x)">


You have to be careful that Saxon doesn't optimize out the call to saxon:discard-document() - this basic outer xsl:for-each works well and has become boilerplate code for whenever I start a new report.

This technique allows you to do things that would otherwise not be feasible with XSLT, and would take longer in another language. For example finding, grouping and sorting all links in your collection of XML files. Coding the XSLT takes minutes and running it takes time proportional to your dataset size, but the restriction of system memory has gone.