lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CA+55aFxf6ALo=1ODt76rN6FMM7RN12v_RHM-XB-hD1f9oqq1TQ@mail.gmail.com>
Date:   Wed, 24 Jan 2018 12:48:25 -0800
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Peter Grayson <jpgrayson@...il.com>
Cc:     Bjorn Helgaas <helgaas@...nel.org>,
        Catalin Marinas <catalin.marinas@...il.com>,
        linux-pci@...r.kernel.org,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Lorenzo Pieralisi <lorenzo.pieralisi@....com>,
        Christian König <christian.koenig@....com>,
        Aaro Koskinen <aaro.koskinen@....fi>,
        Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
        Boris Ostrovsky <boris.ostrovsky@...cle.com>,
        Juergen Gross <jgross@...e.com>,
        Alex Deucher <alexander.deucher@....com>,
        David Airlie <airlied@...ux.ie>
Subject: Re: [GIT PULL] PCI fixes for v4.15

On Wed, Jan 24, 2018 at 12:25 PM, Peter Grayson <jpgrayson@...il.com> wrote:
>
> The latest stgit release (v0.18) ignores any mis-encoding of the email
> body. However, stgit master now decodes email bodies and is thus exposed
> to this kind of stray latin-1 character in a UTF-8 body.
>
> I believe stgit's goal should be to identify and repair this kind of
> issue as git does. I will be working on that.

Yes, good. The "latin1 vs utf-8" confusion is sadly still somewhat
common in Western Europe, from personal experience. People just got
used to Latin1 working almost by accident without any explicit
encoding, possibly _because_ it also acts as the first 256 bytes of
unicode.

I suspect the old 8-bit DOS character set (aka "code page 437") is
perhaps even more commonly seen in some situations, just not in unix
development contexts.. And it lacks a number of the (admittedly rarer)
European accented characters anyway.

So git basically first does a conversion according to the stated
encoding, but after that conversion it will then do another pass to
actually verify that the end result is valid utf-8, and if not, do the
(trivial) latin1 -> utf-8 conversion.

And part of the reason for that latin1 special case is very much the
whole "it's trivial" part. So it's not _just_ about "common error in
western emails", it's also simply that Latin1 really is special in the
Unicode domain.

No other character set has that trivial conversion into utf-8.

See verify_utf8() in commit.c in the git code.

> Unfortunately, the head of stgit master does not yet solve this issue. I
> am working to remedy that.

Thanks. We used to be *horrible* about getting "complex" names right
in the kernel logs, but I've tried to make sure that we actually get
this right and have proper names for the last many years.

              Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ