[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87vei3jvut.fsf@javad.com>
Date: Thu, 15 Feb 2007 18:15:06 +0300
From: Sergei Organov <osv@...ad.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: "J.A. MagallÃÃÃón" <jamagallon@....com>,
Jan Engelhardt <jengelh@...ux01.gwdg.de>,
Jeff Garzik <jeff@...zik.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: somebody dropped a (warning) bomb
Linus Torvalds <torvalds@...ux-foundation.org> writes:
> On Tue, 13 Feb 2007, Sergei Organov wrote:
[...]
> BUT (and this is a big but) within the discussion of "strlen()", that is
> no longer true. "strlen()" exists _outside_ of a single particular
> implementation. As such, "implementation-defined" is no longer something
> that "strlen()" can depend on.
>
> As an example of this argument, try this:
>
> #include <string.h>
> #include <stdio.h>
>
> int main(int argc, char **argv)
> {
> char a1[] = { -1, 0 }, a2[] = { 1, 0 };
>
> printf("%d %d\n", a1[0] < a2[0], strcmp(a1, a2) < 0);
> return 0;
> }
>
> and *before* you compile it, try to guess what the output is.
Well, I'll try to play fair, so I didn't yet compile it. Now, strcmp()
is defined in the C standard so that its behavior doesn't depend on the
sign of char:
"The sign of a nonzero value returned by the comparison functions
memcmp, strcmp, and strncmp is determined by the sign of the
difference between the values of the first pair of characters (both
interpreted as unsigned char) that differ in the objects being
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
compared."
[Therefore, at least I don't need to build GCC multilib'ed on
-fsigned/unsigned-char to get consistent results even if strcmp() in
fact lives in a library, that was my first thought before I referred to
the standard ;)]
Suppose the char is signed. Then a1[0] < a2[0] (= -1 < 1) should be
true. On 2's-complement implementation with 8bit char, -1 converted by
strcmp() to unsigned char should be 0xFF, and 1 converted should be
1. So strcmp() should be equivalent to (0xFF < 1) that is false. So
I'd expect
1 0
result on implementation with signed char.
Now suppose the char is unsigned. Then on 2's-complement implementation
with 8bit-byte CPU, a1[0] should be 0xFF, and a2[0] should be 1. The
result from strcmp() won't change. So I'd expect
0 0
result on implementation with unsigned char.
Now I'm going to compile it (I must admit I'm slightly afraid to get
surprising results, so I've re-read my above reasonings before
compiling):
osv@osv tmp$ cat strcmp.c
#include <stdio.h>
#include <string.h>
int main()
{
char a1[] = { -1, 0 }, a2[] = { 1, 0 };
printf("%d %d\n", a1[0] < a2[0], strcmp(a1, a2) < 0);
return 0;
}
osv@osv tmp$ gcc -v
Using built-in specs.
Target: i486-linux-gnu
...
gcc version 4.1.2 20061028 (prerelease) (Debian 4.1.1-19)
osv@osv tmp$ gcc -O2 strcmp.c -o strcmp && ./strcmp
1 0
> And when that confuses you,
It didn't, or did I miss something? Is char unsigned by default?
> try to compile it using gcc with the
> "-funsigned-char" flag (or "-fsigned-char" if you started out on an
> architecture where char was unsigned by default)
osv@osv tmp$ gcc -O2 -fsigned-char strcmp.c -o strcmp && ./strcmp
1 0
osv@osv tmp$ gcc -O2 -funsigned-char strcmp.c -o strcmp && ./strcmp
0 0
osv@osv tmp$
Due to above, apparently char is indeed signed by default, so what?
> And when you really *really* think about it afterwards, I think you'll go
> "Ahh.. Linus is right". It's more than "implementation-defined": it really
> is totally indeterminate for code like this.
The fact is that strcmp() is explicitly defined in the C standard so
that it will bring the same result no matter what the sign of "char"
type is. Therefore, it obviously can't be used to determine the sign of
"char", so from the POV of usage of strcmp(), the sign of its argument
is indeed "indeterminate". What I fail to see is how this fact could
help in proving that for any function taking "char*" argument the sign
of char is indeterminate.
Anyway, it seems that you still miss (or ignore) my point that it's not
(only) sign of "char" that makes it suspect to call functions requesting
"char*" argument with "unsigned char*" value.
-- Sergei.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists