Why u_char* not char*

Igor Sysoev is at rambler-co.ru
Wed Jul 15 11:06:12 MSD 2009


On Wed, Jul 15, 2009 at 05:41:14AM +0400, Maxim Dounin wrote:

> Hello!
> 
> On Tue, Jul 14, 2009 at 11:56:27PM +0300, Marcus Clyne wrote:
> 
> > Hi,
> >
> > Why are strings in Nginx stored as u_char*'s and not char*'s pointers?   
> > What's the advantage?
> 
> I'm not sure why Igor choose it, but there are at least several 
> reasons to use 'unsigned char' (aka u_char) instead of 'char' 
> (which may be either signed or unsigned):
> 
> - Constructs like
> 
>     u_char   map[] = { 0, 0, 0, 1, 1, ... };
>     u_char  *p;
> 
>     ...
> 
>     if (map[*p]) { ... }
> 
>   work as expected for all possible character values without any 
>   extra typecasting.
> 
> - Comparision works in predictable way.  And you will get (mostly) 
>   reasonable sorting on any arbitraty data even without collation 
>   support.
> 
> - Overflow behaviour undefined for signed types, and bitwise 
>   operators are undefined for negative values.
> 
> So basically if you deal with abitrary byte streams in some 
> arbitrary way as nginx do - 'unsigned char' is better choice.

Yes. I prefer to think about 'char *' as character stream (which are
unsigned by nature), but not as to a stream of small range singed
intergers. This way makes evident the comparisions, bitwise operations, etc.
However, this way requires typecasts in trivial cases, e.g.:

      u_char *p = (u_char *) "text";

But these cases are really trivial (as against to comparisons,
bitwise operations, etc.), are easy catched by compilers, and these
typecasts are just syntax sugar.


-- 
Igor Sysoev
http://sysoev.ru/en/





More information about the nginx mailing list