[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130919034406.GA26385@angband.pl>
Date: Thu, 19 Sep 2013 05:44:06 +0200
From: Adam Borowski <kilobyte@...band.pl>
To: Roy Franz <roy.franz@...aro.org>
Cc: linux-kernel@...r.kernel.org, linux-efi@...r.kernel.org,
matt.fleming@...el.com, leif.lindholm@...aro.org,
msalter@...hat.com
Subject: Re: [PATCH 09/17] Move unicode to ASCII conversion to shared
function.
On Mon, Sep 16, 2013 at 09:11:25PM -0700, Roy Franz wrote:
> +/* Convert the unicode UEFI command line to ASCII to pass to kernel.
> + * Size of memory allocated return in *cmd_line_len.
> + * Returns NULL on error.
> + */
> +static char *efi_convert_cmdline_to_ascii(efi_system_table_t *sys_table_arg,
> + int load_options_size = image->load_options_size / 2; /* ASCII */
> + for (i = 0; i < options_size - 1; i++)
> + *s1++ = *s2++;
I'm afraid both this commit and comments inside are misnamed. What you're
changing here is the encoding rather than character set.
In fact, these days it's 8-bit encodings that are more likely to be Unicode
than 16-bit ones: UTF-8 is ubiquitous, while you usually get UCS2 at most.
In either case, though, we have here is a 7-bit charset encoded as either
8-bit or 16-bit units. What this function does is blindly truncating upper
byte. The supported payload is in both cases ASCII.
I'd thus rename the function to what it already does: truncating u16 to u8,
and adjust comments accordingly.
Replacing values above 126 with a token character like '?' would be good
too: that'd avoid producing corrupted characters and/or random ASCII chars.
Your commit only moves things around, so it might be out of scope for now,
but I wonder: what if the kernel actually supported Unicode here? Few
cmdline arguments take values where non-ASCII makes sense, but at least some
do: for example, a Russian guy is not unlikely to name subvolumes using
cyrillic. Supporting that would be easy (estimating the length then
utf16s_to_utf8s()). There's just one problem: which encoding to use, but
these days, most distributions have either dropped non-UTF8 or hardly pay
lip service, so we could get away with hard-coding UTF-8: those few who
use ancient charsets can stick to ASCII. Would this be ok? If so, shout,
I can code this if you don't care enough.
--
ᛊᚨᚾᛁᛏᚣ᛫ᛁᛊ᛫ᚠᛟᚱ᛫ᚦᛖ᛫ᚹᛖᚨᚲ
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists