linux-kernel - Re: [PATCH 5.10 096/126] KVM: x86/mmu: Use atomic ops to set SPTEs in TDP MMU map

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <fd2030f3-55ba-6088-733b-ac6a551e2170@redhat.com>
Date:   Tue, 6 Apr 2021 17:48:50 +0200
From:   Paolo Bonzini <pbonzini@...hat.com>
To:     Sasha Levin <sashal@...nel.org>
Cc:     Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        linux-kernel@...r.kernel.org, stable@...r.kernel.org,
        Peter Feiner <pfeiner@...gle.com>,
        Ben Gardon <bgardon@...gle.com>
Subject: Re: [PATCH 5.10 096/126] KVM: x86/mmu: Use atomic ops to set SPTEs in
 TDP MMU map

On 06/04/21 15:49, Sasha Levin wrote:
> On Tue, Apr 06, 2021 at 08:09:26AM +0200, Paolo Bonzini wrote:
>> Whoa no, you have included basically a whole new feature, except for 
>> the final patch that actually enables the feature.  The whole new MMU 
> 
> Right, we would usually grab dependencies rather than modifying the
> patch. It means we diverge less with upstream, and custom backports tend
> to be buggier than just grabbing dependencies.

In general I can't disagree.  However, you *are* modifying at least 
commit 048f49809c in your backport, so it's not clear where you draw the 
line and why you didn't ask for help (more on this below).

Only the first five patches here are actual prerequisites for an easy 
backport of the two commits that actually matter (a835429cda91, "KVM: 
x86/mmu: Ensure TLBs are flushed when yielding during GFN range zap"; 
and 048f49809c52, "KVM: x86/mmu: Ensure TLBs are flushed for TDP MMU 
during NX zapping", 2021-03-30).  Everything after "KVM: x86/mmu: Yield 
in TDU MMU iter even if no SPTES changed" could be dropped without 
making it any harder to backport those two.

>> is still not meant to be used in production and development is still 
>> happening as of 5.13.
> 
> Unrelated to this discussion, but how are folks supposed to know which
> feature can and which feature can't be used in production? If it's a
> released kernel, in theory anyone can pick up 5.12 and use it in
> production.

It's not enabled by default and requires turning on a module parameter.

That also means that something like CKI will not notice if anything's 
wrong with these patches.  It also means that I could just shrug and 
hope that no one ever runs this code in 5.10 and 5.11 (which is actually 
the most likely case), but it is the process that is *just wrong*.

>> Were all these patches (82-97) included just to enable patch 98 ("KVM: 
>> x86/mmu: Ensure TLBs are flushed for TDP MMU during NX zapping")? Same 
>> for 105-120 in 5.11.
> 
> Yup. Is there anything wrong with those patches?

The big issue, and the one that you ignoredz every time we discuss this 
topic, is that this particular subset of 17 has AFAIK never been tested 
by anyone.

There's plenty of locking changes in here, one patch that you didn't 
backport has this in its commit message:

    This isn't technically a bug fix in the current code [...] but that
    is all very, very subtle, and will break at the slightest sneeze,

meaning that the locking in 5.10 and 5.11 was also less robust to 
changes elsewhere in the code.

Let's also talk about the process and the timing.  I got the "failed to 
apply" automated message last Friday and I was going to work on the 
backport today since yesterday was a holiday here.  I was *never* CCed 
on a post of this backport for maintainers to review; you guys 
*literally* took random subsets of patches from a feature that is new 
and in active development, and hoped that they worked on a past release.

I could be happy because you just provided me with a perfect example of 
why to use my employer's franken-kernel instead of upstream stable 
kernels... ;) but this is not how a world-class operating system is 
developed.  Who cares if a VM breaks or even if my laptop panics; but 
I'd seriously fear for my data if you applied the same attitude to XFS 
or ext4fs.

For now, please drop all 17 patches from 5.10 and 5.11.  I'll send a 
tested backport as soon as possible.

Paolo