linux-kernel - Re: [PATCH] console UTF-8 fixes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Pine.LNX.4.64.0704121714280.14457@scrub.home>
Date:	Thu, 12 Apr 2007 17:52:49 +0200 (CEST)
From:	Roman Zippel <zippel@...ux-m68k.org>
To:	Egmont Koblinger <egmont@...linux.hu>
cc:	"H. Peter Anvin" <hpa@...or.com>,
	Alan Cox <alan@...rguk.ukuu.org.uk>,
	Jan Engelhardt <jengelh@...ux01.gwdg.de>,
	Pavel Machek <pavel@....cz>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] console UTF-8 fixes

Hi,

On Thu, 12 Apr 2007, Egmont Koblinger wrote:

> > Considering this possible volatility I'm not certain we really need this 
> > in the kernel.
> > The other point is that I have problems imagining, that this should be 
> > enough to edit random text files with a random editor without problems. 
> 
> No, this would not be enough for all special corner cases. It would just be
> better than currently it is. That's my goal.

Well, it often doesn't end there, other users may report these as bugs and 
want to get them fixed, so we have to look ahead a little for possible 
problems.

> > OTOH if the editor has to all this parsing anyway, the whole thing could 
> > be pushed to userspace and the Linux terminal could be marked as handling 
> > all characters equally (a good hint would be if the terminal doesn't even 
> > support wide characters). The terminfo database exists for a good reason 
> 
> Currently not even applications and terminal emulators agree on the width. I
> don't think terminfo has anything to do with it.

It has everything to do with this. It describes the terminal capabilities, 
so why should variable width support not one part of it?

> So, there are several (maybe a few hundred or few thousands) characters that
> are handled differently by different applications/libraries. On the other
> hand, there are approximately 42.000 characters in BMP that are double-width
> according to every width specification or implementation I've seen so far.
> That's roughly the 2/3 of all the code points in BMP. Why is a problem if
> the kernel knows to jump 2 character cells on them?

Wrong question: why is it a problem the kernel doesn't know this?
It's usually easy to "fix" this in the kernel, because it's last in line, 
but we have to look at the whole picture and the kernel isn't solely 
responsible for unicode handling, so why can't this be fixed at 
application level?

bye, Roman
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/