mod_strip newline substitution problem

Raul Rivero rivero at soitu.es
Sun May 4 00:29:51 MSD 2008


Hi all,

We (soitu.es) has a module which cleans HTML and inserts blocks of HTML.

This is our first released module (called 
ngx_http_html_clean_filter_module.c) and you could read the description 
(in spanish) at:

     http://www.soitu.es/soitu/2008/04/28/met/1209378259_661127.html

An usage example is:

  location / {
     html_clean_active on;

     # Pixel counter and common JS to all pages.
     html_clean_pxlcntrblock '<div style="display:none">\
<img src="/pxlctl.gif" width="1" height="1" alt="" />\
<script language="Javascript" type="text/javascript" src="/comun.js">\
</script></div>';

     # Only common JS, no pixel counter
     html_clean_xcldpxlcntrblock '<div style="display:none">\
<script language="Javascript" type="text/javascript" src="/comun.js">\
</script></div>';

   }


A few lines about

   * Removes all white spaces at the begining of lines and empty lines.

   * Removes all HTML comments with a pattern like:

       <!-- =what ever[...]= -->

     So, the following comments is deleted:

       <!-- ===== Begin of menu ===== -->

     But not the:

       <!-- Begin of menu -->

   * Blocks surrounded by #IB and #EB are served without modifications:

       <!-- = #IB[...] = -->
       this will be
           served without
               been modified
       <!-- = #EB[...] = -->

   * The text defined in "html_clean_pxlcntrblock" is inserted after 
<body> tag (we used to count pixels and common JS includes). The problem 
is that we have pages which could not be counted (examples: iframes or 
frames), so you'll need a <!-- = #XP = --> before <body> and the text 
defined in "html_clean_xcldpxlcntrblock" will be inserted.

We are using in our production NGINX servers.

Cheers,


Evan Miller wrote:
> Rt Ibmer <rtibmx at ...> writes:
> 
>>
>> Hi Evan
>>
>>> The problem is that I can't replace newlines before end tags with a space:
>> Is that going to make it even more challenging to add support for removing 
> unnecessary spaces from JavaScript files? Would love to see support for that 
> one day. Currently we are doing this with a filter at the web server level and 
> I bet doing it at the nginx level would be much more efficient.
> 
> Yes. The module will need to be rewritten to parse other languages. I'd like to 
> use Ragel, a state-machine generator that operates on data buffers.
> 
> http://www.cs.queensu.ca/~thurston/ragel/
> 
> Evan
> 
> 


-- 
Raul Rivero      | Micromedios Digitales                      |
Director Tecnico | Cochabamba, 11, bajo. Madrid, 28016. Spain |
soitu.es         | Email: rivero at soitu.es  Tel: +34 911420000 |





More information about the nginx mailing list