lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <e1c96619-4402-4577-b2b8-a694bd5569c7@mailbox.org>
Date: Wed, 11 Dec 2024 18:42:58 +0100
From: Simon Pilkington <simonp.git@...lbox.org>
To: Tom Lendacky <thomas.lendacky@....com>,
 Sean Christopherson <seanjc@...gle.com>
Cc: linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
 regressions@...ts.linux.dev
Subject: Re: [REGRESSION] from 74a0e79df68a8042fb84fd7207e57b70722cf825: VFIO
 PCI passthrough no longer works

On 11/12/2024 15:37, Tom Lendacky wrote:
> On 12/10/24 16:43, Sean Christopherson wrote:
>> On Tue, Dec 10, 2024, Tom Lendacky wrote:
>>> On 12/10/24 14:33, Simon Pilkington wrote:
>>>> On 10/12/2024 16:47, Sean Christopherson wrote:
>>>>> Can you run with the below to see what bits the guest is trying to set (or clear)?
>>>>> We could get the same info via tracepoints, but this will likely be faster/easier.
>>>>>
>>>>> ---
>>>>>  arch/x86/kvm/svm/svm.c | 12 +++++++++---
>>>>>  1 file changed, 9 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>>>>> index dd15cc635655..5144d0283c9d 100644
>>>>> --- a/arch/x86/kvm/svm/svm.c
>>>>> +++ b/arch/x86/kvm/svm/svm.c
>>>>> @@ -3195,11 +3195,14 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
>>>>>  	case MSR_AMD64_DE_CFG: {
>>>>>  		u64 supported_de_cfg;
>>>>>  
>>>>> -		if (svm_get_feature_msr(ecx, &supported_de_cfg))
>>>>> +		if (WARN_ON_ONCE(svm_get_feature_msr(ecx, &supported_de_cfg)))
>>>>>  			return 1;
>>>>>  
>>>>> -		if (data & ~supported_de_cfg)
>>>>> +		if (data & ~supported_de_cfg) {
>>>>> +			pr_warn("DE_CFG supported = %llx, WRMSR = %llx\n",
>>>>> +				supported_de_cfg, data);
>>>>>  			return 1;
>>>>> +		}
>>>>>  
>>>>>  		/*
>>>>>  		 * Don't let the guest change the host-programmed value.  The
>>>>> @@ -3207,8 +3210,11 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
>>>>>  		 * are completely unknown to KVM, and the one bit known to KVM
>>>>>  		 * is simply a reflection of hardware capabilities.
>>>>>  		 */
>>>>> -		if (!msr->host_initiated && data != svm->msr_decfg)
>>>>> +		if (!msr->host_initiated && data != svm->msr_decfg) {
>>>>> +			pr_warn("DE_CFG current = %llx, WRMSR = %llx\n",
>>>>> +				svm->msr_decfg, data);
>>>>>  			return 1;
>>>>> +		}
>>>>>  
>>>>>  		svm->msr_decfg = data;
>>>>>  		break;
>>>>>
>>>>> base-commit: fac04efc5c793dccbd07e2d59af9f90b7fc0dca4
>>>>
>>>> Relevant dmesg output with some context below. VM locked up as expected.
>>>>
>>>> [   85.834971] vfio-pci 0000:0c:00.0: resetting
>>>> [   85.937573] vfio-pci 0000:0c:00.0: reset done
>>>> [   86.494210] vfio-pci 0000:0c:00.0: resetting
>>>> [   86.494264] vfio-pci 0000:0c:00.1: resetting
>>>> [   86.761442] vfio-pci 0000:0c:00.0: reset done
>>>> [   86.761480] vfio-pci 0000:0c:00.1: reset done
>>>> [   86.762392] vfio-pci 0000:0c:00.0: resetting
>>>> [   86.865462] vfio-pci 0000:0c:00.0: reset done
>>>> [   86.977360] virbr0: port 1(vnet1) entered learning state
>>>> [   88.993052] virbr0: port 1(vnet1) entered forwarding state
>>>> [   88.993057] virbr0: topology change detected, propagating
>>>> [  103.459114] kvm_amd: DE_CFG current = 0, WRMSR = 2
>>>> [  161.442032] virbr0: port 1(vnet1) entered disabled state // VM shut down
>>>
>>> That is the MSR_AMD64_DE_CFG_LFENCE_SERIALIZE bit. Yeah, that actually
>>> does change the behavior of LFENCE and isn't just a reflection of the
>>> hardware.
>>>
>>> Linux does set that bit on boot, too (if LFENCE always serializing isn't
>>> advertised 8000_0021_EAX[2]), so I'm kind of surprised it didn't pop up
>>> there.
>>
>> Linux may be running afoul of this, but it would only become visible if someone
>> checked dmesg.  Even the "unsafe" MSR accesses in Linux gracefully handle faults
>> these days, the only symptom would be a WARN.
>>
>>> I imagine that the above CPUID bit isn't set, so an attempt is made to
>>> set the MSR bit.
>>
>> Yep.  And LFENCE_RDTSC _is_ supported, otherwise the supported_de_cfg check would
>> have failed.  Which means it's a-ok for the guest to set the bit, i.e. KVM won't
>> let the guest incorrectly think it's running on CPU for which LFENCE is serializing.
>>
>> Unless you (Tom) disagree, I vote to simply drop the offending code, i.e. make
>> all supported bits fully writable from the guest.  KVM is firmly in the wrong here,
>> and I can't think of any reason to disallow the guest from clearing LFENCE_SERIALIZE.
>>
>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>> index 6a350cee2f6c..5a82ead3bf0f 100644
>> --- a/arch/x86/kvm/svm/svm.c
>> +++ b/arch/x86/kvm/svm/svm.c
>> @@ -3201,15 +3201,6 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
>>                 if (data & ~supported_de_cfg)
>>                         return 1;
>>  
>> -               /*
>> -                * Don't let the guest change the host-programmed value.  The
>> -                * MSR is very model specific, i.e. contains multiple bits that
>> -                * are completely unknown to KVM, and the one bit known to KVM
>> -                * is simply a reflection of hardware capabilities.
>> -                */
>> -               if (!msr->host_initiated && data != svm->msr_decfg)
>> -                       return 1;
>> -
> 
> That works for me.
> 
> Thanks,
> Tom
> 
>>                 svm->msr_decfg = data;
>>                 break;
>>         }
>>

Thanks for the prompt response on this Sean & Tom.

Regards,
Simon

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ