linux-kernel - Re: [PATCH 1/2] KVM: vmx, pmu: accept 0 for absent MSRs when host-initiated

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YqoqZjH+yjYJTxmT@google.com>
Date:   Wed, 15 Jun 2022 18:52:22 +0000
From:   Sean Christopherson <seanjc@...gle.com>
To:     Like Xu <like.xu.linux@...il.com>
Cc:     Paolo Bonzini <pbonzini@...hat.com>, linux-kernel@...r.kernel.org,
        kvm@...r.kernel.org
Subject: Re: [PATCH 1/2] KVM: vmx, pmu: accept 0 for absent MSRs when
 host-initiated

On Thu, Jun 02, 2022, Like Xu wrote:
> Thanks for your sincerity, always.
> 
> On 2/6/2022 12:39 am, Sean Christopherson wrote:
> > On Wed, Jun 01, 2022, Like Xu wrote:
> > > On 1/6/2022 2:37 am, Sean Christopherson wrote:
> > > > Can we just punt this out of kvm/queue until its been properly reviewed?  At the
> > > > barest of glances, there are multiple flaws that should block this from being
> > > 
> > > TBH, our reviewers' attention could not be focused on these patches until the
> > > day it was ready to be ravaged. "Try to accept" is a good thing, and things need
> > > to move forward, not simply be abandoned to the side.
> > 
> > I strongly disagree, to put it mildly.  Accepting flawed, buggy code because
> > reviewers and maintainers are overloaded does not solve anything, it only makes
> > the problem worse.  More than likely, the submitter(s) has moved on to writing
> > the next pile of patches, while the same set of people that are trying to review
> > submissions are left to pick up the pieces.  There are numerous examples of
> > accepting code without (IMO) proper review and tests costing us dearly in the
> > long run.
> 
> I actually agree and understand the situation of maintainers/reviewers.
> No one wants to maintain flawed code, especially in this community
> where the majority of previous contributors disappeared after the code
> was merged in. The existing heavy maintenance burden is already visible.
> 
> Thus we may have a maintainer/reviewers scalability issue. Due to missing
> trust, competence or mastery of rules, most of the patches sent to the list
> have no one to point out their flaws.

Then write tests and run the ones that already exist.  Relying purely on reviewers
to detect flaws does not and cannot scale.  I agree that we currently have a
scalability issue, but I have different views on how to improve things.

> I have privately received many complaints about the indifference of our
> community, which is distressing.
> 
> To improve that, I propose to add "let's try to accept" before "queued, thanks".
> 
> Obviously, "try to accept" is not a 100% commitment and it will fail with high
> probability, but such a stance (along with standard clarifications and requirements)
> from reviewers and maintainers will make the contributors more concerned,
> attract potential volunteers, and focus the efforts of our nominated reviewers.
> 
> Such moves include explicit acceptance or rejection, a "try to accept" response
> from some key persons (even if it ends up being a failure), or a separate
> git branch,
> but please, don't leave a lasting silence, especially for those big series.

I completely agree on needing better transparency for the lifecycle of patches  
going through the KVM tree.  First and foremost, there need to be formal, documented 
rules for the "official" kvm/* branches, e.g. everything in kvm/queue passes ABC
tests, everything in kvm/next also passes XYZ tests.  That would also be a good 
place to document expectations, how things works, etc...

At that point, I think we could add e.g. kvm/pending to hold patches that have  
gotten the "queued, thanks" but haven't yet passed testing to get all the way to
kvm/queue.  In theory, that would eliminate, or at least significantly reduce, the
non-trivial time gap between applying them locally and actually pushing to kvm/queue.

I'll work on writing rules+documentation and getting buy-in from Paolo.

That said, I'm still opposed to pivoting to a "let's try to accept" approach.
IMO, a major part of the problem is lack of testcases and lack of _running_ what
tests we do have.  To be blunt, it's beyond frustrating infuriating that a series
that's at v12 breaks _existing_ KVM-Unit-Tests on any AMD host.  And that same
series wasn't accompanied by any new testcases.

Yes, a kvm/pending or whatever would help mitigate such issues, but that doesn't
fundamentally reduce the burden put on maintainers/reviewers.  Basic issues such
as breaking KUT should be caught before a series is posted.  I realize there are
exceptions, e.g. folks from Intel and AMD may not have access to the right
hardware, but given that you're also posting AMD specific patches, I don't think
that exception applies here.  And yes, mistakes will happen, I've certainly been
guilty of my share, but I fully expect any such mistakes to be caught well before
getting to v12.

The other way to reduce maintainer/reviewer burden is by writing thorough testcases.
A thorough, well-documented test not only proves that the code works, it also shows
to reviewiers that the developer actually considered edge cases and helps expedite
the process of thinking through such edge cases.  E.g. taking arch LBRs as an
example, a test that actually visited every edge in KVM's state machine for enabling
and disabling MSR intercepts would improve confidence that the code is correct and
would greatly help reviewers understand the various combinations and transitions.

I fully realize that writing tests is not glamorous, and that some of KVM's tooling
and infrastructure is lacking, but IMO having contributors truly buy-in to writing
tests for testing's sake instead of viewing writing tests as a necessary evil to get
accepted upstream would go a long, long way to helping improve the overall velocity
of KVM development.

> Similar moves will increase transparency in decision making to reward and
> attract a steady stream of high quality trusted contributors to do more and more
> for our community and their employers (if any).
> 
> > 
> > If people want their code to be merged more quickly, then they can do so by
> > helping address the underlying problems, e.g. write tests that actually try to
> > break their feature instead of doing the bare minimum, review each others code,
> > clean up the existing code (and tests!), etc...  There's a reason vPMU features
> > tend to not get a lot of reviews; KVM doesn't even get the basics right, so there's
> > not a lot of interest in trying to enable fancy, complex features.
> 
> I'd like know more about "KVM doesn't even get the basics right" on vPMU. :D

https://lore.kernel.org/all/CABOYuvbPL0DeEgV4gsC+v786xfBAo3T6+7XQr7cVVzbaoFoEAg@mail.gmail.com
https://lore.kernel.org/all/YofCfNsl6O45hYr0@google.com