[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4617DBF7.5060009@zytor.com>
Date: Sat, 07 Apr 2007 10:59:19 -0700
From: "H. Peter Anvin" <hpa@...or.com>
To: Egmont Koblinger <egmont@...linux.hu>
CC: Jan Engelhardt <jengelh@...ux01.gwdg.de>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] console UTF-8 fixes
Egmont Koblinger wrote:
> On Sat, Apr 07, 2007 at 01:00:48PM +0200, Jan Engelhardt wrote:
>
> Hi,
>
>> Please, no dot, and no inverse color.
>> Imagine someone had the following bitmap for <unknown glyph/illegal sequence>:
>
> No dot, I'm already convinced. To clarify the inverse thingy:
>
> This is what the current kernel does:
> 1) tries to display the desired symbol
> 2) if it fails, tries to display U+FFFD (which usually looks similar to an
> inverted question mark)
> 3) if this fails again then displays a normal '?'
> (or a different symbol due to a bug discussed below)
>
> Here's my proposal. This only alters the 3rd step, not the first two:
> 1) tries to display the desired symbol
> 2) if it fails, tries to display U+FFFD, still with _normal_ attributes
> 3) if this fails then display an ascii '?' with inverted attributes
>
> So you won't get "double" inversion. If you do have U+FFFD in your font then
> this will introduce no chance. If you don't have U+FFFD, you'll see inverse
> question marks instead of normal ones.
>
This seems fine.
>
>> I blame your latin2 unicode map. (See above about 'Û'.)
>
> There's nothing wrong with my latin2 unicode map, and I've located and
> changed the part _in the kernel_ that displays a false glyph using the
> algorithm I've outlined. It just uses "the glyph at that code position
> within the glyph table" as a fallback, which might be okay in 8-bit mode
> (and I haven't modified the behavior in that case), but I got rid of this
> behavior in UTF-8 mode since it's definitely a fault in the world of
> Unicode.
>
>> It should perhaps display a regular 'u' if it cannot display 'û',
>
> I rather think it should display U+FFFD but YMMV.
That's a policy decision for the maker of the Unicode map. The kernel
cannot by default know that a pre-composed ű is a modified u; obviously,
if the ű is send in decomposed form the kernel probably will display it
as u? or some such.
>> but definitely not 'ü' (which is not called a double accent, btw).
>
> This is not the character I've been talking about, I actually _did_ talk
> about u with double acute accent (ű - you might not have seen this character
> so far, AFAIK it's only used in Hungarian, no other languages). But we agree
> that the kernel definitely shouldn't display a character with a different
> accent on it. This is one of the bugs my patch addresses.
As far as width handling -- in order to make all the text line up under
all circumstances you need more than width handling. The wcwidth()
stuff is specific to CJK -- a character set which is totally implausible
to display on the builtin console. You also need bidir support (in case
you encounter Hebrew or Arabic), you need Indic shape handling (Indic
langauges have some *very* odd composing rules), etc, and this is just
to know how much space to take up on the screen.
is is ridiculous. It's much better to draw a line in the sand and say
that this is beyond the scope of the in-kernel Linux console.
-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists