[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <636a0ddb3d6c56fd14357cb2d5392d429ab604b4.camel@intel.com>
Date: Mon, 15 Aug 2022 23:06:53 +0000
From: "Huang, Kai" <kai.huang@...el.com>
To: "Christopherson,, Sean" <seanjc@...gle.com>
CC: "Shahar, Sagi" <sagis@...gle.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"isaku.yamahata@...il.com" <isaku.yamahata@...il.com>,
"will@...nel.org" <will@...nel.org>,
"kvm@...r.kernel.org" <kvm@...r.kernel.org>,
"pbonzini@...hat.com" <pbonzini@...hat.com>,
"Yamahata, Isaku" <isaku.yamahata@...el.com>,
"Aktas, Erdem" <erdemaktas@...gle.com>
Subject: Re: [PATCH v8 003/103] KVM: Refactor CPU compatibility check on
module initialization
On Mon, 2022-08-15 at 22:35 +0000, Sean Christopherson wrote:
> On Fri, Aug 12, 2022, Huang, Kai wrote:
> > On Thu, 2022-08-11 at 17:39 +0000, Sean Christopherson wrote:
> > > I've been poking at the "hardware enable" code this week for other reasons, and
> > > have come to the conclusion that the current implementation is a mess.
> >
> > Thanks for the lengthy reply :)
> >
> > First of all, to clarify, I guess by "current implementation" you mean the
> > current upstream KVM code, but not this particular patch? :)
>
> Yeah, upstream code.
>
> > > Of course, that path is broken for other reasons too, e.g. needs to prevent CPUs
> > > from going on/off-line when KVM is enabling hardware.
> > > https://lore.kernel.org/all/20220216031528.92558-7-chao.gao@intel.com
> >
> > If I read correctly, the problem described in above link seems only to be true
> > after we move CPUHP_AP_KVM_STARTING from STARTING section to ONLINE section, but
> > this hasn't been done yet in the current upstream KVM. Currently,
> > CPUHP_AP_KVM_STARTING is still in STARTING section so it is guaranteed it has
> > been executed before start_secondary sets itself to online cpu mask.
>
> The lurking issue is that for_each_online_cpu() can against hotplug, i.e. every
> instance of for_each_online_cpu() in KVM is buggy (at least on the x86 side, I
> can't tell at a glance whether or not arm pKVM's usage is safe).
>
> https://lore.kernel.org/all/87bl20aa72.ffs@tglx
Yes agreed. for_each_online_cpu() can race with CPU hotplug.
But the fact is looks there are many places using for_each_online_cpus() w/o
holding cpus_read_lock(). :)
>
> > Btw I saw v4 of Chao's patchset was sent Feb this year. It seems that series
> > indeed improved CPU compatibility check and hotplug handling. Any reason that
> > series wasn't merged?
>
> AFAIK it was just a lack of reviews/acks for the non-KVM patches.
>
> > Also agreed that kvm_lock should be used. But I am not sure whether
> > cpus_read_lock() is needed (whether CPU hotplug should be prevented). In
> > current KVM, we don't do CPU compatibility check for hotplug CPU anyway, so when
> > KVM does CPU compatibility check using for_each_online_cpu(), if CPU hotplug
> > (hot-removal) happens, the worst case is we lose compatibility check on that
> > CPU.
> >
> > Or perhaps I am missing something?
>
> On a hot-add of an incompatible CPU, KVM would potentially skip the compatibility
> check and try to enable hardware on an incompatible/broken CPU.
To resolve this, we need to do compatibility check before actually enabling
hardware on each cpu, as Chao's series did. I don't see using cpus_read_lock()
alone can actually fix anything.
>
> Another possible bug is the checking of hv_get_vp_assist_page(); hot-adding a
> CPU that failed to allocate the VP assist page while vmx_init() is checking online
> CPUs could result in a NULL pointer deref due to KVM not rejecting the CPU as it
> should.
>
So we need Chao's series to fix those problems: 1) Do compatibility check before
actually enable the hardware for each cpu; 2) allow CPU hotplug to fail; 3) Hold
cpus_read_lock() when needed.
--
Thanks,
-Kai
Powered by blists - more mailing lists