Why u_char* not char*

Marcus Clyne maccaday at gmail.com
Wed Jul 15 14:13:04 MSD 2009


Igor Sysoev wrote:
> On Wed, Jul 15, 2009 at 05:41:14AM +0400, Maxim Dounin wrote:
>
>   
>> Hello!
>>
>> On Tue, Jul 14, 2009 at 11:56:27PM +0300, Marcus Clyne wrote:
>>
>>     
>>> Hi,
>>>
>>> Why are strings in Nginx stored as u_char*'s and not char*'s pointers?   
>>> What's the advantage?
>>>       
>> I'm not sure why Igor choose it, but there are at least several 
>> reasons to use 'unsigned char' (aka u_char) instead of 'char' 
>> (which may be either signed or unsigned):
>>
>> - Constructs like
>>
>>     u_char   map[] = { 0, 0, 0, 1, 1, ... };
>>     u_char  *p;
>>
>>     ...
>>
>>     if (map[*p]) { ... }
>>
>>   work as expected for all possible character values without any 
>>   extra typecasting.
>>
>> - Comparision works in predictable way.  And you will get (mostly) 
>>   reasonable sorting on any arbitraty data even without collation 
>>   support.
>>
>> - Overflow behaviour undefined for signed types, and bitwise 
>>   operators are undefined for negative values.
>>
>> So basically if you deal with abitrary byte streams in some 
>> arbitrary way as nginx do - 'unsigned char' is better choice.
>>     
>
> Yes. I prefer to think about 'char *' as character stream (which are
> unsigned by nature), but not as to a stream of small range singed
> intergers. This way makes evident the comparisions, bitwise operations, etc.
> However, this way requires typecasts in trivial cases, e.g.:
>
>       u_char *p = (u_char *) "text";
>
> But these cases are really trivial (as against to comparisons,
> bitwise operations, etc.), are easy catched by compilers, and these
> typecasts are just syntax sugar.
>
>   

Thanks for the info (both of you).

Marcus.





More information about the nginx mailing list