lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20200521155536.GA38602@redhat.com>
Date:   Thu, 21 May 2020 11:55:36 -0400
From:   Vivek Goyal <vgoyal@...hat.com>
To:     Paolo Bonzini <pbonzini@...hat.com>
Cc:     Andy Lutomirski <luto@...capital.net>,
        Thomas Gleixner <tglx@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>,
        Andy Lutomirski <luto@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>, X86 ML <x86@...nel.org>,
        kvm list <kvm@...r.kernel.org>, stable <stable@...r.kernel.org>
Subject: Re: [PATCH v2] x86/kvm: Disable KVM_ASYNC_PF_SEND_ALWAYS

On Wed, Apr 08, 2020 at 12:07:22AM +0200, Paolo Bonzini wrote:
> On 07/04/20 23:41, Andy Lutomirski wrote:
> > 2. Access to bad memory results in #MC.  Sure, #MC is a turd, but
> > it’s an *architectural* turd. By all means, have a nice simple PV
> > mechanism to tell the #MC code exactly what went wrong, but keep the
> > overall flow the same as in the native case.
> > 
> > I think I like #2 much better. It has another nice effect: a good
> > implementation will serve as a way to exercise the #MC code without
> > needing to muck with EINJ or with whatever magic Tony uses. The
> > average kernel developer does not have access to a box with testable
> > memory failure reporting.
> 
> I prefer #VE, but I can see how #MC has some appeal. 

I have spent some time looking at #MC and trying to figure out if we
can use it. I have encountered couple of issues.

- Uncorrected Action required machine checks are generated when poison
  is consumed. So typically all kernel code and exception handling is
  assuming MCE can be encoutered synchronously only on load and not
  store. stores don't generate MCE (atleast not AR one, IIUC). If we were
  to use #MC, we will need to generate it on store as well and then that
  requires changing assumptions in kernel which assumes stores can't
  generate #MC (Change all copy_to_user()/copy_from_user() and friends)

- Machine check is generated for poisoned memory. And in this it is not
  exaclty poisoning. It feels like as if memory has gone missing. And
  failure might be temporary that is if file is truncated again to extend,
  then next load/store to same memory location will work just fine. My
  understanding is that sending #MC will mark that page poisoned and
  it will sort of become permanent failure. 

I am less concerned about point 2, but not sure how to get past the
first issue.

Thanks
Vivek

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ