u_char vs char (was: [PATCH] Removed the unsafe ngx_memcmp() wrapper for memcmp(3))
Alejandro Colomar
alx.manpages at gmail.com
Tue Nov 8 11:15:51 UTC 2022
Hello!
On 11/8/22 10:50, Maxim Dounin wrote:
>> Even if it's a bit off-topic, I'm very curious about the reason for using
>> u_char. It definitely requires a lot of extra work compared to 'char *': casts,
>> type-safety, reviewing that code just works when workarounding/disabling the
>> compiler warnings. I'm guessing it was also some workaround for broken old
>> implementations and it has just continued like that for consistency, but am
>> curious if there are other better reasons. Certainly, ASCII characters behave
>> well (at least nowadays) independently of the signedness of char, and usually
>> one doesn't do arithmetic with characters in strings.
>
> Using signed chars for strings simply does not work as long as you
> consider 8-bit strings. It results in wrong sorting unless you do
> care to compare characters as unsigned, requires careful handling
> of all range comparisons such as "ch <= 0x20", does not permit
> things like "ch < 0x80" or "c >= 0xc0", makes impossible to use
> table lookups such as "basis64[s[0]]" (all snippets are from nginx
> code).
>
> The fact that signedness of "char" is not known adds even more
> fun: you can't really do anything without casting it to either
> unsigned char or signed char.
>
> In general, using "char" for strings is a well known source of
> troubles at least in the Cyrillic world. Writing the code which
> works with arbitrary chars is tricky and error-prone as long as
> you are doing anything more complex than just calling libc
> functions. On the other hand, casts for external functions can be
> easily abstracted in most cases, and always trivial.
Hmm, yeah, it makes sense. The libc design around char instead of u_char is
broken by design, and the requirement that libc macros need to be called with a
cast (e.g., toupper(3)) shows that.
If nginx does things with chars other than calling libc, it makes a lot of sense
to also use u_char.
Thanks for the rationale! It certainly helps to understand why it was done that
way.
Cheers,
Alex
--
<http://www.alejandro-colomar.es/>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://mailman.nginx.org/pipermail/nginx-devel/attachments/20221108/034cf527/attachment.bin>
More information about the nginx-devel
mailing list