linux-kernel - Re: [PATCH] KVM/x86: Do not clear SIPI while in SMM

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cdbd1e4e-a5a3-4c3f-92e5-deee8d26280b@oracle.com>
Date: Wed, 17 Apr 2024 09:58:50 -0400
From: boris.ostrovsky@...cle.com
To: Igor Mammedov <imammedo@...hat.com>
Cc: Sean Christopherson <seanjc@...gle.com>,
        Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] KVM/x86: Do not clear SIPI while in SMM



On 4/17/24 8:40 AM, Igor Mammedov wrote:
> On Tue, 16 Apr 2024 19:37:09 -0400
> boris.ostrovsky@...cle.com wrote:
> 
>> On 4/16/24 7:17 PM, Sean Christopherson wrote:
>>> On Tue, Apr 16, 2024, boris.ostrovsky@...cle.com wrote:
>>>> (Sorry, need to resend)
>>>>
>>>> On 4/16/24 6:03 PM, Paolo Bonzini wrote:
>>>>> On Tue, Apr 16, 2024 at 10:57 PM <boris.ostrovsky@...cle.com> wrote:
>>>>>> On 4/16/24 4:53 PM, Paolo Bonzini wrote:
>>>>>>> On 4/16/24 22:47, Boris Ostrovsky wrote:
>>>>>>>> Keeping the SIPI pending avoids this scenario.
>>>>>>>
>>>>>>> This is incorrect - it's yet another ugly legacy facet of x86, but we
>>>>>>> have to live with it.  SIPI is discarded because the code is supposed
>>>>>>> to retry it if needed ("INIT-SIPI-SIPI").
>>>>>>
>>>>>> I couldn't find in the SDM/APM a definitive statement about whether SIPI
>>>>>> is supposed to be dropped.
>>>>>
>>>>> I think the manual is pretty consistent that SIPIs are never latched,
>>>>> they're only ever used in wait-for-SIPI state.
>>>>>   
>>>>>>> The sender should set a flag as early as possible in the SIPI code so
>>>>>>> that it's clear that it was not received; and an extra SIPI is not a
>>>>>>> problem, it will be ignored anyway and will not cause trouble if
>>>>>>> there's a race.
>>>>>>>
>>>>>>> What is the reproducer for this?
>>>>>>
>>>>>> Hotplugging/unplugging cpus in a loop, especially if you oversubscribe
>>>>>> the guest, will get you there in 10-15 minutes.
>>>>>>
>>>>>> Typically (although I think not always) this is happening when OVMF if
>>>>>> trying to rendezvous and a processor is missing and is sent an extra SMI.
>>>>>
>>>>> Can you go into more detail? I wasn't even aware that OVMF's SMM
>>>>> supported hotplug - on real hardware I think there's extra work from
>>>>> the BMC to coordinate all SMIs across both existing and hotplugged
>>>>> packages(*)
>>>>
>>>>
>>>> It's been supported by OVMF for a couple of years (in fact, IIRC you were
>>>> part of at least initial conversations about this, at least for the unplug
>>>> part).
>>>>
>>>> During hotplug QEMU gathers all cpus in OVMF from (I think)
>>>> ich9_apm_ctrl_changed() and they are all waited for in
>>>> SmmCpuRendezvous()->SmmWaitForApArrival(). Occasionally it may so happen
>>>> that the SMI from QEMU is not delivered to a processor that was *just*
>>>> successfully hotplugged and so it is pinged again (https://github.com/tianocore/edk2/blob/fcfdbe29874320e9f876baa7afebc3fca8f4a7df/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c#L304).
>>>>
>>>>
>>>> At the same time this processor is now being brought up by kernel and is
>>>> being sent INIT-SIPI-SIPI. If these (or at least the SIPIs) arrive after the
>>>> SMI reaches the processor then that processor is not going to have a good
>>>> day.
> 
> Do you use qemu/firmware combo that negotiated ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT/
> ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT features?

Yes.

> 
>>>
>>> It's specifically SIPI that's problematic.  INIT is blocked by SMM, but latched,
>>> and SMIs are blocked by WFS, but latched.  And AFAICT, KVM emulates all of those
>>> combinations correctly.
>>>
>>> Why is the SMI from QEMU not delivered?  That seems like the smoking gun.
>>
>> I haven't actually traced this but it seems that what happens is that cv
>> the newly-added processor is about to leave SMM and the count of in-SMM
>> processors is decremented. At the same time, since the processor is
>> still in SMM the QEMU's SMM is not taken.
>>
>> And so when the count is looked at again in SmmWaitForApArrival() one
>> processor is missing.
> 
> Current QEMU CPU hotplug workflow with SMM enabled, should be following:
> 
>    1. OSPM gets list(N) of hotplugged cpus
>    2. OSPM hands over control to firmware (SMM callback leading to SMI broadcast)
>    3. Firmware at this point shall initialize all new CPUs (incl. relocating SMBASE for new ones)
>       it shall pull in all CPUs that are present at the moment
>    4. Firmware returns control to OSPM
>    5. OSPM sends Notify to the list(N) CPUs triggering INIT-SIPI-SIPI _only_ on
>       those CPUs that it collected in step 1
> 
> above steps will repeat until all hotplugged CPUs are handled.
> 
> In nutshell INIT-SIPI-SIPI shall not be sent to a freshly hotplugged CPU
> that OSPM haven't seen (1) yet _and_ firmware should have initialized (3).
> 
> CPUs enumerated at (3) at least shall include CPUs present at (1)
> and may include newer CPU arrived in between (1-3).
> 
> CPUs collected at (1) shall all get SMM, if it doesn't happen
> then hotplug workflow won't work as expected.
> In which case we need to figure out why SMM is not delivered
> or why firmware isn't waiting for hotplugged CPU.

I noticed that I was using a few months old qemu bits and now I am 
having trouble reproducing this on latest bits. Let me see if I can get 
this to fail with latest first and then try to trace why the processor 
is in this unexpected state.

-boris