linux-kernel - Re: [PATCH] KVM: x86/mmu: Add capability to zap only sptes for the affected memslot

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200713190649.GE29725@linux.intel.com>
Date:   Mon, 13 Jul 2020 12:06:50 -0700
From:   Sean Christopherson <sean.j.christopherson@...el.com>
To:     Alex Williamson <alex.williamson@...hat.com>
Cc:     Paolo Bonzini <pbonzini@...hat.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        Joerg Roedel <joro@...tes.org>, kvm@...r.kernel.org,
        linux-kernel@...r.kernel.org,
        Xiong Zhang <xiong.y.zhang@...el.com>,
        Wayne Boyer <wayne.boyer@...el.com>,
        Zhenyu Wang <zhenyuw@...ux.intel.com>,
        Jun Nakajima <jun.nakajima@...el.com>
Subject: Re: [PATCH] KVM: x86/mmu: Add capability to zap only sptes for the
 affected memslot

On Mon, Jul 13, 2020 at 12:22:26PM -0600, Alex Williamson wrote:
> On Thu, 9 Jul 2020 21:29:22 -0700
> Sean Christopherson <sean.j.christopherson@...el.com> wrote:
> 
> > +Alex, whom I completely spaced on Cc'ing.
> > 
> > Alex, this is related to the dreaded VFIO memslot zapping issue from last
> > year.  Start of thread: https://patchwork.kernel.org/patch/11640719/.
> > 
> > The TL;DR of below: can you try the attached patch with your reproducer
> > from the original bug[*]?  I honestly don't know whether it has a legitimate
> > chance of working, but it's the one thing in all of this that I know was
> > definitely a bug.  I'd like to test it out if only to sate my curiosity.
> > Absolutely no rush.
> 
> Mixed results, maybe you can provide some guidance.  Running this
> against v5.8-rc4, I haven't reproduced the glitch.  But it's been a
> long time since I tested this previously, so I went back to v5.3-rc5 to
> make sure I still have a recipe to trigger it.  I can still get the
> failure there as the selective flush commit was reverted in rc6.  Then
> I wondered, can I take broken v5.3-rc5 and apply this fix to prove that
> it works?  No, v5.3-rc5 + this patch still glitches.  So I thought
> maybe I could make v5.8-rc4 break by s/true/false/ in this patch.
> Nope.  Then I applied the original patch from[1] to try to break it.
> Nope.  So if anything, I think the evidence suggests this was broken
> elsewhere and is now fixed, or maybe it is a timing issue that I can't
> trigger on newer kernels.  If the reproducer wasn't so touchy and time
> consuming, I'd try to bisect, but I don't have that sort of bandwidth.

Ow.  That manages to be both a best case and worst case scenario.  I can't
think of any clever way to avoid bisecting.  There have been a number of
fixes in tangentially related code since 5.3, e.g. memslots, MMU, TLB,
etc..., but trying to isolate which one, if any of them, fixed the bug has
a high probability of being a wild goose chase.

The only ideas I have going forward are to:

  a) Reproduce the bug outside of your environment and find a resource that
     can go through the painful bisection.

  b) Add a module param to toggle the new behavior and see if anything
     breaks.

I can ask internally if it's possible to get a resource on my end to go
after (a).  (b) is a question for Paolo.

Thanks much for testing!

> Thanks,
> 
> Alex
> 
> [1] https://patchwork.kernel.org/patch/10798453/
>