lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 29 Apr 2008 13:06:38 +0200
From:	Willy Tarreau <w@....eu>
To:	Adrian Bunk <bunk@...nel.org>
Cc:	Helge Hafting <helge.hafting@...el.hist.no>,
	"H. Peter Anvin" <hpa@...nel.org>, linux-kernel@...r.kernel.org,
	trivial@...nel.org
Subject: Re: [2.6 patch] UTF-8 fixes in comments

On Tue, Apr 29, 2008 at 01:42:16PM +0300, Adrian Bunk wrote:
> On Tue, Apr 29, 2008 at 12:09:34PM +0200, Willy Tarreau wrote:
> > On Tue, Apr 29, 2008 at 11:06:05AM +0200, Helge Hafting wrote:
> > > >Well, I accidentally used a freshly installed laptop running mandriva 2008.
> > > >I was typing in a terminal inside KDE (I don't know the program name, sort
> > > >of an xterm, but with huge borders all around). I made a typo in a word and
> > > >typed in a "é" (e acute). Pressing backspace to fix it showed me that I
> > > >remove more chars than typed. I tried again. Pressing this letter 5 times,
> > > >then 10 times backspace. I removed 5 chars from the prompt. I suspect that
> > > >if I had used some chars with wider encoding (eg 4 bytes), I could have
> > > >removed as many... Clearly those tools are not ready.
> > > >  
> > > So don't use that particular tool
> > 
> > It was not my machine, and had you been there, you would have heard me call
> > it names !
> > 
> > > and/or file a bug with the maintainer. :-)
> > 
> > It's too easy to impose crappy designs to end-users and tell them that if
> > that does not work they have to file a bug. There are a minimal set of
> > things that must be tested before shipping. Seeing that the default
> > terminal emulator in KDE on Mandriva 2008 is configured in UTF-8 and does
> > not properly render it simply makes me sick. This is broken by design and
> > even distros trying to get it working for years still can't cope with it.
> > There must be a reason.
> 
> I can reproduce your problem in a plain xterm when setting LANG=en_US
> (most likely the same problem can occur with other non UTF-8 settings).

possibly they broke it when forcing support for variable length ?

> In this case I'm actually more surprised that the character is displayed 
> correctly than that you have to type backspace twice.

It's not that I *had* to type it twice. But I *could* type it twice, and
the first one removed the character, the second one the prompt.

> Any kind of charset mixing is highly problematic (which is also why my 
> patch was attached compressed), so if you disable UTF-8 anywhere in a 
> modern distribution problems are somehow expected (it could also be a 
> bug in Mandrivas default settings, but that would really surprise me).

No, it was not disabled at all. I had to type in a command for a
co-worker who just did a default install the day before, and typed a
typo which I wanted to fix.

> > Unicode yes, UTF-8 no. UTF-8 is a compressed encoding of unicode.
> > That's as silly as if you had to replace your terminals to read
> > native gzip, and expect them as well as all the tools to work
> > properly!
> 
> It's not a compressed encoding, it's a variable-length encoding.
> 
> Besides the size advantages one main advantage of UTF-8 is that ASCII is 
> valid UTF-8. This means that for the ASCII source code in the kernel it 
> doesn't matter whether it's treated as ASCII or UTF-8, and no conversion 
> was needed.
> 
> You can't get this property with a fixed-size Unicode encoding.

I don't agree. If you refuse character-set mixing, there's no problem.
Bit 7 of first char == 1 ? => full text is 32 bit.

Willy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ