linux-kernel - Re: [PATCH] console UTF-8 fixes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4617DBF7.5060009@zytor.com>
Date:	Sat, 07 Apr 2007 10:59:19 -0700
From:	"H. Peter Anvin" <hpa@...or.com>
To:	Egmont Koblinger <egmont@...linux.hu>
CC:	Jan Engelhardt <jengelh@...ux01.gwdg.de>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] console UTF-8 fixes

Egmont Koblinger wrote:
> On Sat, Apr 07, 2007 at 01:00:48PM +0200, Jan Engelhardt wrote:
> 
> Hi,
> 
>> Please, no dot, and no inverse color.
>> Imagine someone had the following bitmap for <unknown glyph/illegal sequence>:
> 
> No dot, I'm already convinced. To clarify the inverse thingy:
> 
> This is what the current kernel does:
>   1) tries to display the desired symbol
>   2) if it fails, tries to display U+FFFD (which usually looks similar to an
>      inverted question mark)
>   3) if this fails again then displays a normal '?'
>      (or a different symbol due to a bug discussed below)
> 
> Here's my proposal. This only alters the 3rd step, not the first two:
>   1) tries to display the desired symbol
>   2) if it fails, tries to display U+FFFD, still with _normal_ attributes
>   3) if this fails then display an ascii '?' with inverted attributes
> 
> So you won't get "double" inversion. If you do have U+FFFD in your font then
> this will introduce no chance. If you don't have U+FFFD, you'll see inverse
> question marks instead of normal ones.
> 

This seems fine.

> 
>> I blame your latin2 unicode map. (See above about 'Û'.)
> 
> There's nothing wrong with my latin2 unicode map, and I've located and
> changed the part _in the kernel_ that displays a false glyph using the
> algorithm I've outlined. It just uses "the glyph at that code position
> within the glyph table" as a fallback, which might be okay in 8-bit mode
> (and I haven't modified the behavior in that case), but I got rid of this
> behavior in UTF-8 mode since it's definitely a fault in the world of
> Unicode.
> 
>> It should perhaps display a regular 'u' if it cannot display 'û',
> 
> I rather think it should display U+FFFD but YMMV.

That's a policy decision for the maker of the Unicode map.  The kernel 
cannot by default know that a pre-composed ű is a modified u; obviously, 
if the ű is send in decomposed form the kernel probably will display it 
as u? or some such.

>> but definitely not 'ü' (which is not called a double accent, btw).
> 
> This is not the character I've been talking about, I actually _did_ talk
> about u with double acute accent (ű - you might not have seen this character
> so far, AFAIK it's only used in Hungarian, no other languages). But we agree
> that the kernel definitely shouldn't display a character with a different
> accent on it. This is one of the bugs my patch addresses.

As far as width handling -- in order to make all the text line up under 
all circumstances you need more than width handling.  The wcwidth() 
stuff is specific to CJK -- a character set which is totally implausible 
to display on the builtin console.  You also need bidir support (in case 
you encounter Hebrew or Arabic), you need Indic shape handling (Indic 
langauges have some *very* odd composing rules), etc, and this is just 
to know how much space to take up on the screen.

is is ridiculous.  It's much better to draw a line in the sand and say 
that this is beyond the scope of the in-kernel Linux console.

	-hpa

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/