lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cf3a1d6b64050bb51328305dd8a3bd01abffa9c9.camel@perches.com>
Date:   Wed, 18 Jul 2018 14:55:13 -0700
From:   Joe Perches <joe@...ches.com>
To:     Andrew Morton <akpm@...ux-foundation.org>,
        Geert Uytterhoeven <geert+renesas@...der.be>
Cc:     Andy Whitcroft <apw@...onical.com>,
        Stephen Rothwell <sfr@...b.auug.org.au>,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] checkpatch: Only encode UTF-8 quoted printable mail
 headers

On Wed, 2018-07-18 at 14:42 -0700, Andrew Morton wrote:
> On Wed, 18 Jul 2018 16:52:54 +0200 Geert Uytterhoeven <geert+renesas@...der.be> wrote:
> 
> > As PERL uses its own internal character encoding, always calling
> > encode("utf8", ...) on the author name may cause corruption, leading to
> > an author signoff mismatch.
> > 
> > This happens in the following cases:
> >   - If a patch is in ISO-8859, and contains a non-ASCII author name in
> >     the From: line, it is converted to UTF-8, while the Signed-off-by
> >     line will still be in ISO-8859.
> >   - If a patch is in UTF-8, and contains a non-ASCII author name in the
> >     body (not header) From: line, it is assumed to be encoded in PERL's
> >     internal character encoding, and converted to UTF-8 incorrectly,
> >     while the Signed-off-by line will be in real UTF-8.
> > 
> > Fix this by only doing the encode step if the From: line used UTF-8
> > quoted printable encoding.
> 
> Works for me, thanks.

Me too so far, but I've more testing I'd like to do.

> Relatedly, would it be worth adding a checkpatch warning if a patch
> contains anything other than ASCII or UTF-8?
> 
> I added this to my little local patch-checking script.
> 
> 	if ! file $p | grep -q -P "ASCII text|Unicode text"
> 	then
> 		echo $p: weird charset
> 	fi

Might be hard to be effective.

For instance, the lkml mail I've kept so far this year
has a mixture of ascii/utf-8/iso-8859/windows-1252 and
some others with a few different encodings used too.

$ grep -Poh "\bcharset=\S+" ~/.local/share/evolution/mail/local/.MailingLists.Linux-Kernel/cur/*|cut -f3- -d:|sort|uniq -c|sort -rn
    821 charset=us-ascii
    469 charset="UTF-8"
    394 charset="ISO-8859-1"
    252 charset=US-ASCII
    221 charset=utf-8
    118 charset=utf-8;
     97 charset="utf-8"
     66 charset=UTF-8
     60 charset="us-ascii"
     33 charset=ISO-8859-15
     24 charset=iso-8859-1
     18 charset=US-ASCII;
     11 charset=us-ascii;
      7 charset=windows-1252;
      7 charset="utf-8";
      6 charset="UTF-8";
      5 charset=windows-1252
      5 charset="iso-8859-1"
      4 charset="windows-1252"
      3 charset=UTF-8;
      3 charset="US-ASCII"
      2 charset="iso-2022-jp"
      2 charset=gbk;
      2 charset="gb2312"
      1 charset="utf-7"
      1 charset="iso-8859-15"
      1 charset=ISO-8859-1
      1 charset="gbk";

And

$ grep "^Content-Transfer-Encoding:" ~/.local/share/evolution/mail/local/.MailingLists.Linux-Kernel/cur/*|cut -f3- -d:|sort|uniq -c|sort -rn
    873 Content-Transfer-Encoding: 7bit
    212 Content-Transfer-Encoding: 8bit
     97 Content-Transfer-Encoding: quoted-printable
     63 Content-Transfer-Encoding: base64
     56 Content-Transfer-Encoding: 8BIT
     24 Content-Transfer-Encoding: 7Bit
      3 Content-Transfer-Encoding: 7BIT
      2 Content-Transfer-Encoding: QUOTED-PRINTABLE

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ