linux-kernel - Re: [PATCH] Kbuild: set LC_MESSAGES=C (as LC

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <4B41FED9.1060601@suse.cz>
Date:	Mon, 04 Jan 2010 15:44:41 +0100
From:	Michal Marek <mmarek@...e.cz>
To:	"H. Peter Anvin" <hpa@...or.com>
Cc:	Roland Dreier <rdreier@...co.com>,
	Sergei Trofimovich <slyich@...il.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	linux-kernel@...r.kernel.org, Sergei Trofimovich <slyfox@...ox.ru>
Subject: Re: [PATCH] Kbuild: set LC_MESSAGES=C (as LC_CTYPE=C is)

On 26.12.2009 21:04, H. Peter Anvin wrote:
> On 12/25/2009 05:17 PM, Roland Dreier wrote:
>>
>>  > The whole reason with only setting some LC_* to C was to be able to
>>  > leave LC_MESSAGES intact, but it seems it breaks on too many real-life
>>  > systems.
>>
>>  > As such, I suggest we should set LC_ALL=C and get rid of the rest of it:
>>
>> Seems unfortunate to lose localized error messages.  (Although in my
>> en_US.UTF-8 case, all I get is non-ASCII quote characters)
>>
>> This all started because of the awk invocation in arch/x86/lib.  Maybe
>> the best idea would be to confine the locale monkeying to that one
>> place?
>>
> 
> It is also possible that setting only LC_COLLATE will solve the most
> fundamental problem, which is the one of character ranges.  LC_COLLATE
> probably will interfere less with LC_MESSAGES than the setting of LC_CTYPE.

We need LC_COLLATE=C so that [a-z] really means lowercase ASCII letters
and nothing else (most importantly not uppercase letters) in awk, sed
and the shell. If we stay with LC_CTYPE=$userdefined, the meaning of
[[:classes:]] becomes indeterministic and so does the mapping of
lowercase and uppercase characters:

$ echo iI | LC_CTYPE=tr_TR.UTF-8 awk '{ print $0 " " toupper($0) " "
tolower($0) }'
iI İI iı

Character classes are probably not a big issue (modulo the fact that
mawk doesn't seem to support them), because the input is ascii text
anyway. Regarding the tolower()/toupper() functions, I found one
potential troublemaker:

$ git grep -E 'to(lower|upper)' | grep -v '\.[ch]:'
arch/sh/tools/gen-mach-types:            tolower(mach[i]), mach[i]);

Maybe this awk script should be run with LC_ALL=C, people mostly care
about (localized) messages from gcc, not from awk.

Michal
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/