Please add HTML support for http_xslt_module (there's an nginx fork which has it already)
Peter Halasz
list at pengo.org
Fri Mar 9 06:20:10 UTC 2012
Hi devs,
I work for an environmental not-for-profit organisation where we use
XSLT to theme our website. (The XSLT is generated by Diazo, and the
site largely runs on Plone).
Currently we use Nginx to do the XSLT transformation. There's a
problem though, that our un-themed site doesn't come out as perfect
XML, so we need an XSLT parser which can transform HTML (not just
XML). Nginx's http_xslt_module does NOT currently support HTML
parsing, and I'd really like to see this feature added.
The problem isn't specific to Diazo, but the Diazo manual explains the
need for HTML parsing:
> In theory, any XSLT processor will do. In practice, however, most websites do not produce 100% well-formed XML (i.e. they do not conform to the XHTML “strict” doctype). For this reason, it is normally necessary to use an XSLT processor that will parse the content using a more lenient parser with some knowledge of HTML. libxml2, the most popular XML processing library on Linux and similar operating systems, contains such a parser.
Fortunately there's a fork of nginx which does use libxml2: the
xslt_html project <http://code.google.com/p/html-xslt/>.
Unfortunately, the project is not maintained, so it ties us to a
patched version of nginx 0.7.67 (circa June 2010). I'd like to upgrade
nginx -- I've hit nginx bugs that were fixed long ago. I'm sure there
are many other nginx users with the same needs, so I'm requesting the
fork's changes make their way into the mainline. I'm assuming it's
just been forgotten.
The Diazo documentation also explains deploying with this patched Nginx:
> To deploy an Diazo theme to the Nginx web server, you will need to compile Nginx with a special version of the XSLT module that can (optionally) use the HTML parser from libxml2.
> In the future, the necessary patches to enable HTML mode parsing will hopefully be part of the standard Nginx distribution. In the meantime, they are maintained in the html-xslt project.
We're using this html-xslt fork of nginx at my organisation. But
unfortunately, it's not maintained, and the functionality hasn't made
it into the standard Nginx distribution. Can we please include it?
The fork adds the directive: "xslt_html_parser on;" which causes the
http_xslt_module to parse in HTML mode.
I've just made a diff <http://pastebin.com/CP1P8Gzj> to see what the
fork changes, and it's 755 lines long. (That's a bit longer than I
expected)
The files modified by the html-xslt fork are:
src/http/modules/ngx_http_xslt_filter_module.c
src/http/ngx_http_variables.c
auto/options
auto/lib/libxslt/conf
The diff is against nginx 0.7.67. Since then the
ngx_http_xslt_filter_module.c has seen about 300 lines removed and 20
lines added or changed, so obviously the diff can't be used as a patch
against the current version of nginx.
Hopefully that's more than enough info to get started if developers
are interested in folding the fork into nginx.
I know the other solution to our problem here is to move the XSLT to
another layer of the stack -- such as Varnish or Apache -- but I want
to make sure nginx devs know about the feature they're missing first.
Thanks for listening and I hope HTML parsing for XSLT can make it to
the mainline of nginx,
Peter Halasz.
More information about the nginx-devel
mailing list