lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <3908561D78D1C84285E8C5FCA982C28F61342363@ORSMSX114.amr.corp.intel.com>
Date:   Thu, 17 Aug 2017 23:32:16 +0000
From:   "Luck, Tony" <tony.luck@...el.com>
To:     Andrew Morton <akpm@...ux-foundation.org>
CC:     Borislav Petkov <bp@...e.de>,
        "Hansen, Dave" <dave.hansen@...el.com>,
        Naoya Horiguchi <n-horiguchi@...jp.nec.com>,
        "Elliott, Robert (Persistent Memory)" <elliott@....com>,
        "x86@...nel.org" <x86@...nel.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH-resend] mm/hwpoison: Clear PRESENT bit for kernel 1:1
 mappings of poison pages

> It's unclear (to lil ole me) what the end-user-visible effects of this
> are.
>
> Could we please have a description of that?  So a) people can
> understand your decision to cc:stable and b) people whose kernels are
> misbehaving can use your description to decide whether your patch might
> fix the issue their users are reporting.

Ingo already applied this to the tip tree, so too late to fix the commit message :-(

A very, very, unlucky end user with a system that supports machine check recovery
(Xeon E7, or Xeon-SP-platinum) that has recovered from one or more uncorrected
memory errors (lucky so far) might find a subsequent uncorrected memory error flagged
as fatal because the machine check bank that should log the error is already occupied
by a log caused by a speculative access to one of the earlier uncorrected errors (the
unlucky part).

We haven't seen this happen at the Linux OS level, but it is a theoretical possibility.
[Some BIOS that map physical memory 1:1 have seen this when doing eMCA processing
for the first error ... as soon as they load the address of the error from the MCi_ADDR
register they are vulnerable to some speculative access dereferencing the register with 
the address and setting the overflow bit in the machine check bank that still holds the
original log].

-Tony

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ