linux-kernel - Re: [PATCH 09/17] Move unicode to ASCII conversion to shared function.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20130919230241.GA18666@angband.pl>
Date:	Fri, 20 Sep 2013 01:02:41 +0200
From:	Adam Borowski <kilobyte@...band.pl>
To:	Roy Franz <roy.franz@...aro.org>
Cc:	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	linux-efi@...r.kernel.org, matt.fleming@...el.com,
	Leif Lindholm <leif.lindholm@...aro.org>,
	Mark Salter <msalter@...hat.com>
Subject: Re: [PATCH 09/17] Move unicode to ASCII conversion to shared
 function.

On Wed, Sep 18, 2013 at 09:48:44PM -0700, Roy Franz wrote:
> On Wed, Sep 18, 2013 at 8:44 PM, Adam Borowski <kilobyte@...band.pl> wrote:
> > [UCS2 truncation]
> 
> I stuck to re-arranging the code that was there, as I don't know enough
> about character encodings to propose changes.

I on the other hand don't know the kernel (lurking because of my first
patch), but I'm on a crusade against mangled Unicode (so far in the
userland).  Can't let such a blatant error slip through on my watch :)

> Also, this code is running as part of the kernel decompressor, rather than
> the kernel itself, so it doesn't have access to any kernel facilities, and
> it also needs to be position independent.

Ok, so it can't reuse common libraries.  No problem, a simplified, sanitized
and optimized copy of utf16s_to_utf8s() can be done in quite less code than
the original.

> It's running in a quite limited environment - the decompressor has
> its own copy of strstr(), and other string functions.

I'd need nothing but a way to alloc the new string.  And I see this is
already done (efi_{low,high_alloc()).

> I checked the UEFI specification, and it states that all 16 bit strings
> are UCS-2, unless otherwise noted.

... which means it will either get upgraded to UTF-16 in a subsequent
version, or some Unicode strings get mangled.  I'd ignore this bit and
implement full UTF-16 from the start: every legal UCS-2 string can be
decoded as UTF-16 so it's a strict superset.

> The load options that the command line is provided through a void pointer
> specified as: [snip]

Either a null pointer or a 16-bit string, that sounds clear enough.

I see not a word about endianness (does anything do EFI on big endian?),
but "same as host" seems to be a reasonable assumption.

> Would it be acceptable to fix the naming/comments, and convert values
> above 126 to '?' in the current patchset, and address a more thorough fix
> in another patch set?  The ARM and ARM64 EFI stub patchsets that are
> mostly complete depend on this one, so getting this merged soon would be
> helpful.

I don't want to hinder your work, so what about putting in your version
as-is and fixing it later?

> > There's just one problem: which encoding to use, but
> > these days, most distributions have either dropped non-UTF8 or hardly pay
> > lip service, so we could get away with hard-coding UTF-8: those few who
> > use ancient charsets can stick to ASCII.

Not being able to use regular kernel facilities makes supporting ancient
charsets a lost cause.  I'm so weeping about them... not.

> I would certainly appreciate your help improving this

Are we on the same page so far?  If so, I can make a patch atop yours.

-- 
ᛊᚨᚾᛁᛏᚣ᛫ᛁᛊ᛫ᚠᛟᚱ᛫ᚦᛖ᛫ᚹᛖᚨᚲ
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/