[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230725091611.GA3766257@hirez.programming.kicks-ass.net>
Date: Tue, 25 Jul 2023 11:16:11 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
Paolo Bonzini <pbonzini@...hat.com>,
linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
Andrew Cooper <Andrew.Cooper3@...rix.com>,
Kai Huang <kai.huang@...el.com>, Chao Gao <chao.gao@...el.com>
Subject: Re: [PATCH v4 14/19] KVM: SVM: Check that the current CPU supports
SVM in kvm_is_svm_supported()
On Mon, Jul 24, 2023 at 02:40:03PM -0700, Sean Christopherson wrote:
> On Mon, Jul 24, 2023, Peter Zijlstra wrote:
> > On Fri, Jul 21, 2023 at 01:18:54PM -0700, Sean Christopherson wrote:
> > > Check "this" CPU instead of the boot CPU when querying SVM support so that
> > > the per-CPU checks done during hardware enabling actually function as
> > > intended, i.e. will detect issues where SVM isn't support on all CPUs.
> >
> > Is that a realistic concern?
>
> It's not a concern in the sense that it should never happen, but I know of at
> least one example where VMX on Intel completely disappeared[1]. The "compatibility"
> checks are really more about the entire VMX/SVM feature set, the base VMX/SVM
> support check is just an easy and obvious precursor to the full compatibility
> checks.
>
> Of course, SVM doesn't currently have compatibility checks on the full SVM feature
> set, but that's more due to lack of a forcing function than a desire to _not_ have
> them. Intel CPUs have a pesky habit of bugs, ucode updates, and/or in-field errors
> resulting in VMX features randomly appearing or disappearing. E.g. there's an
> ongoing buzilla (sorry) issue[2] where a user is only able to load KVM *after* a
> suspend+resume cycle, because TSC scaling only shows up on one socket immediately
> after boot, which is then somehow resolved by suspend+resume.
>
> [1] 009bce1df0bb ("x86/split_lock: Don't write MSR_TEST_CTRL on CPUs that aren't whitelisted")
> [2] https://bugzilla.kernel.org/show_bug.cgi?id=217574
Is that using late loading of ucode? Anything that changes *any* feature
flag must be early ucode load, there is no other possible way since
einux does feature enumeration early, and features are fixed thereafter.
This is one of the many reasons late loading is a trainwreck.
Doing suspend/resume probably re-loads the firmware and re-does the
feature enumeration -- I didn't check.
Also, OMG don't you just love computers :/
Powered by blists - more mailing lists