linux-kernel - Re: [PATCH v2] checkpatch: Only encode UTF-8 quoted printable mail headers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Thu, 19 Jul 2018 16:50:26 +0200
From:   Arnd Bergmann <arnd@...db.de>
To:     Geert Uytterhoeven <geert+renesas@...der.be>
Cc:     Andy Whitcroft <apw@...onical.com>, Joe Perches <joe@...ches.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Stephen Rothwell <sfr@...b.auug.org.au>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Martin Schwidefsky <schwidefsky@...ibm.com>
Subject: Re: [PATCH v2] checkpatch: Only encode UTF-8 quoted printable mail headers

On Wed, Jul 18, 2018 at 4:52 PM, Geert Uytterhoeven
<geert+renesas@...der.be> wrote:
> As PERL uses its own internal character encoding, always calling
> encode("utf8", ...) on the author name may cause corruption, leading to
> an author signoff mismatch.
>
> This happens in the following cases:
>   - If a patch is in ISO-8859, and contains a non-ASCII author name in
>     the From: line, it is converted to UTF-8, while the Signed-off-by
>     line will still be in ISO-8859.
>   - If a patch is in UTF-8, and contains a non-ASCII author name in the
>     body (not header) From: line, it is assumed to be encoded in PERL's
>     internal character encoding, and converted to UTF-8 incorrectly,
>     while the Signed-off-by line will be in real UTF-8.
>
> Fix this by only doing the encode step if the From: line used UTF-8
> quoted printable encoding.
>
> Reported-by: Andrew Morton <akpm@...ux-foundation.org>
> Signed-off-by: Geert Uytterhoeven <geert+renesas@...der.be>
> ---
> Fixes: bc76e3a125b44379 ("checkpatch: warn if missing author Signed-off-by")
> in -next
>
> To be folded into "checkpatch: Warn if missing author Signed-off-by" in
> Andrew's tree.

On a related note, I've looked through all files in the kernel, and found
that very file files in there are something other than 7-bit ASCII, UTF-8
or non-text files (according to /usr/bin/file). These are the only ones I found:

Documentation/devicetree/bindings/net/nfc/pn544.txt: ISO-8859 text
arch/arm/boot/dts/sun4i-a10-inet97fv2.dts:           C source, ISO-8859 text
arch/arm/crypto/sha256_glue.c:                       C source, ISO-8859 text
arch/arm/crypto/sha256_neon_glue.c:                  C source, ISO-8859 text
arch/m68k/hp300/hp300map.map:                        ISO-8859 text
arch/s390/kernel/ebcdic.c:                           C source, Non-ISO
extended-ASCII text
drivers/crypto/vmx/ghashp8-ppc.pl:                   a /usr/bin/env
perl script, ISO-8859 text executable
drivers/iio/dac/ltc2632.c:                           C source, ISO-8859 text
drivers/power/reset/ltc2952-poweroff.c:              C source, ISO-8859 text
drivers/staging/rtl8188eu/include/odm.h:             C source, ISO-8859 text
drivers/tty/vt/defkeymap.map:                        ISO-8859 text
kernel/events/callchain.c:                           C source, ISO-8859 text
lib/fonts/font_7x14.c:                               data
lib/fonts/font_8x16.c:                               data
lib/fonts/font_8x8.c:                                data
lib/fonts/font_pearl_8x8.c:                          data
net/netfilter/ipvs/Kconfig:                          ISO-8859 text
net/netfilter/ipvs/ip_vs_mh.c:                       C source, ISO-8859 text
tools/power/cpupower/po/de.po:                       GNU gettext
message catalogue, ISO-8859 text
tools/power/cpupower/po/fr.po:                       GNU gettext
message catalogue, ISO-8859 text

Almost all of those can be trivially converted using 'recode ISO-8859-1..UTF-8',
which we should probably do. The four font files contain comments for each
of the 256 characters, so that recode turns e.g. the <FF> character
into <U+00FF>,
which is probably still what we want here.

The one exception seems to be arch/s390/kernel/ebcdic.c, which apparently
uses 0x81 bytes as an excape before characters ISO-8859-1 characters with
the high bit set. I don't know what that encoding is called, but I managed
to manually convert it into something useful.

       Arnd