[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <e1c96619-4402-4577-b2b8-a694bd5569c7@mailbox.org>
Date: Wed, 11 Dec 2024 18:42:58 +0100
From: Simon Pilkington <simonp.git@...lbox.org>
To: Tom Lendacky <thomas.lendacky@....com>,
Sean Christopherson <seanjc@...gle.com>
Cc: linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
regressions@...ts.linux.dev
Subject: Re: [REGRESSION] from 74a0e79df68a8042fb84fd7207e57b70722cf825: VFIO
PCI passthrough no longer works
On 11/12/2024 15:37, Tom Lendacky wrote:
> On 12/10/24 16:43, Sean Christopherson wrote:
>> On Tue, Dec 10, 2024, Tom Lendacky wrote:
>>> On 12/10/24 14:33, Simon Pilkington wrote:
>>>> On 10/12/2024 16:47, Sean Christopherson wrote:
>>>>> Can you run with the below to see what bits the guest is trying to set (or clear)?
>>>>> We could get the same info via tracepoints, but this will likely be faster/easier.
>>>>>
>>>>> ---
>>>>> arch/x86/kvm/svm/svm.c | 12 +++++++++---
>>>>> 1 file changed, 9 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>>>>> index dd15cc635655..5144d0283c9d 100644
>>>>> --- a/arch/x86/kvm/svm/svm.c
>>>>> +++ b/arch/x86/kvm/svm/svm.c
>>>>> @@ -3195,11 +3195,14 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
>>>>> case MSR_AMD64_DE_CFG: {
>>>>> u64 supported_de_cfg;
>>>>>
>>>>> - if (svm_get_feature_msr(ecx, &supported_de_cfg))
>>>>> + if (WARN_ON_ONCE(svm_get_feature_msr(ecx, &supported_de_cfg)))
>>>>> return 1;
>>>>>
>>>>> - if (data & ~supported_de_cfg)
>>>>> + if (data & ~supported_de_cfg) {
>>>>> + pr_warn("DE_CFG supported = %llx, WRMSR = %llx\n",
>>>>> + supported_de_cfg, data);
>>>>> return 1;
>>>>> + }
>>>>>
>>>>> /*
>>>>> * Don't let the guest change the host-programmed value. The
>>>>> @@ -3207,8 +3210,11 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
>>>>> * are completely unknown to KVM, and the one bit known to KVM
>>>>> * is simply a reflection of hardware capabilities.
>>>>> */
>>>>> - if (!msr->host_initiated && data != svm->msr_decfg)
>>>>> + if (!msr->host_initiated && data != svm->msr_decfg) {
>>>>> + pr_warn("DE_CFG current = %llx, WRMSR = %llx\n",
>>>>> + svm->msr_decfg, data);
>>>>> return 1;
>>>>> + }
>>>>>
>>>>> svm->msr_decfg = data;
>>>>> break;
>>>>>
>>>>> base-commit: fac04efc5c793dccbd07e2d59af9f90b7fc0dca4
>>>>
>>>> Relevant dmesg output with some context below. VM locked up as expected.
>>>>
>>>> [ 85.834971] vfio-pci 0000:0c:00.0: resetting
>>>> [ 85.937573] vfio-pci 0000:0c:00.0: reset done
>>>> [ 86.494210] vfio-pci 0000:0c:00.0: resetting
>>>> [ 86.494264] vfio-pci 0000:0c:00.1: resetting
>>>> [ 86.761442] vfio-pci 0000:0c:00.0: reset done
>>>> [ 86.761480] vfio-pci 0000:0c:00.1: reset done
>>>> [ 86.762392] vfio-pci 0000:0c:00.0: resetting
>>>> [ 86.865462] vfio-pci 0000:0c:00.0: reset done
>>>> [ 86.977360] virbr0: port 1(vnet1) entered learning state
>>>> [ 88.993052] virbr0: port 1(vnet1) entered forwarding state
>>>> [ 88.993057] virbr0: topology change detected, propagating
>>>> [ 103.459114] kvm_amd: DE_CFG current = 0, WRMSR = 2
>>>> [ 161.442032] virbr0: port 1(vnet1) entered disabled state // VM shut down
>>>
>>> That is the MSR_AMD64_DE_CFG_LFENCE_SERIALIZE bit. Yeah, that actually
>>> does change the behavior of LFENCE and isn't just a reflection of the
>>> hardware.
>>>
>>> Linux does set that bit on boot, too (if LFENCE always serializing isn't
>>> advertised 8000_0021_EAX[2]), so I'm kind of surprised it didn't pop up
>>> there.
>>
>> Linux may be running afoul of this, but it would only become visible if someone
>> checked dmesg. Even the "unsafe" MSR accesses in Linux gracefully handle faults
>> these days, the only symptom would be a WARN.
>>
>>> I imagine that the above CPUID bit isn't set, so an attempt is made to
>>> set the MSR bit.
>>
>> Yep. And LFENCE_RDTSC _is_ supported, otherwise the supported_de_cfg check would
>> have failed. Which means it's a-ok for the guest to set the bit, i.e. KVM won't
>> let the guest incorrectly think it's running on CPU for which LFENCE is serializing.
>>
>> Unless you (Tom) disagree, I vote to simply drop the offending code, i.e. make
>> all supported bits fully writable from the guest. KVM is firmly in the wrong here,
>> and I can't think of any reason to disallow the guest from clearing LFENCE_SERIALIZE.
>>
>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>> index 6a350cee2f6c..5a82ead3bf0f 100644
>> --- a/arch/x86/kvm/svm/svm.c
>> +++ b/arch/x86/kvm/svm/svm.c
>> @@ -3201,15 +3201,6 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
>> if (data & ~supported_de_cfg)
>> return 1;
>>
>> - /*
>> - * Don't let the guest change the host-programmed value. The
>> - * MSR is very model specific, i.e. contains multiple bits that
>> - * are completely unknown to KVM, and the one bit known to KVM
>> - * is simply a reflection of hardware capabilities.
>> - */
>> - if (!msr->host_initiated && data != svm->msr_decfg)
>> - return 1;
>> -
>
> That works for me.
>
> Thanks,
> Tom
>
>> svm->msr_decfg = data;
>> break;
>> }
>>
Thanks for the prompt response on this Sean & Tom.
Regards,
Simon
Powered by blists - more mailing lists