lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <636a0ddb3d6c56fd14357cb2d5392d429ab604b4.camel@intel.com>
Date:   Mon, 15 Aug 2022 23:06:53 +0000
From:   "Huang, Kai" <kai.huang@...el.com>
To:     "Christopherson,, Sean" <seanjc@...gle.com>
CC:     "Shahar, Sagi" <sagis@...gle.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "isaku.yamahata@...il.com" <isaku.yamahata@...il.com>,
        "will@...nel.org" <will@...nel.org>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "pbonzini@...hat.com" <pbonzini@...hat.com>,
        "Yamahata, Isaku" <isaku.yamahata@...el.com>,
        "Aktas, Erdem" <erdemaktas@...gle.com>
Subject: Re: [PATCH v8 003/103] KVM: Refactor CPU compatibility check on
 module initialization

On Mon, 2022-08-15 at 22:35 +0000, Sean Christopherson wrote:
> On Fri, Aug 12, 2022, Huang, Kai wrote:
> > On Thu, 2022-08-11 at 17:39 +0000, Sean Christopherson wrote:
> > > I've been poking at the "hardware enable" code this week for other reasons, and
> > > have come to the conclusion that the current implementation is a mess.
> > 
> > Thanks for the lengthy reply :)
> > 
> > First of all, to clarify, I guess by "current implementation" you mean the
> > current upstream KVM code, but not this particular patch? :)
> 
> Yeah, upstream code.
> 
> > > Of course, that path is broken for other reasons too, e.g. needs to prevent CPUs
> > > from going on/off-line when KVM is enabling hardware.
> > > https://lore.kernel.org/all/20220216031528.92558-7-chao.gao@intel.com
> > 
> > If I read correctly, the problem described in above link seems only to be true
> > after we move CPUHP_AP_KVM_STARTING from STARTING section to ONLINE section, but
> > this hasn't been done yet in the current upstream KVM.  Currently,
> > CPUHP_AP_KVM_STARTING is still in STARTING section so it is guaranteed it has
> > been executed before start_secondary sets itself to online cpu mask. 
> 
> The lurking issue is that for_each_online_cpu() can against hotplug, i.e. every
> instance of for_each_online_cpu() in KVM is buggy (at least on the x86 side, I
> can't tell at a glance whether or not arm pKVM's usage is safe).
> 
> https://lore.kernel.org/all/87bl20aa72.ffs@tglx

Yes agreed. for_each_online_cpu() can race with CPU hotplug.

But the fact is looks there are many places using for_each_online_cpus() w/o
holding cpus_read_lock().  :)

> 
> > Btw I saw v4 of Chao's patchset was sent Feb this year.  It seems that series
> > indeed improved CPU compatibility check and hotplug handling.  Any reason that
> > series wasn't merged?
> 
> AFAIK it was just a lack of reviews/acks for the non-KVM patches.
> 
> > Also agreed that kvm_lock should be used.  But I am not sure whether
> > cpus_read_lock() is needed (whether CPU hotplug should be prevented).  In
> > current KVM, we don't do CPU compatibility check for hotplug CPU anyway, so when
> > KVM does CPU compatibility check using for_each_online_cpu(), if CPU hotplug
> > (hot-removal) happens, the worst case is we lose compatibility check on that
> > CPU.
> > 
> > Or perhaps I am missing something?
> 
> On a hot-add of an incompatible CPU, KVM would potentially skip the compatibility
> check and try to enable hardware on an incompatible/broken CPU.

To resolve this, we need to do compatibility check before actually enabling
hardware on each cpu, as Chao's series did.  I don't see using cpus_read_lock()
alone can actually fix anything.

> 
> Another possible bug is the checking of hv_get_vp_assist_page(); hot-adding a
> CPU that failed to allocate the VP assist page while vmx_init() is checking online
> CPUs could result in a NULL pointer deref due to KVM not rejecting the CPU as it
> should.
> 

So we need Chao's series to fix those problems: 1) Do compatibility check before
actually enable the hardware for each cpu; 2) allow CPU hotplug to fail; 3) Hold
cpus_read_lock() when needed.


-- 
Thanks,
-Kai


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ