lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c64099db-33b3-9438-536e-7882bae614e0@amd.com>
Date: Wed, 11 Dec 2024 08:37:26 -0600
From: Tom Lendacky <thomas.lendacky@....com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Simon Pilkington <simonp.git@...lbox.org>, linux-kernel@...r.kernel.org,
 kvm@...r.kernel.org, regressions@...ts.linux.dev
Subject: Re: [REGRESSION] from 74a0e79df68a8042fb84fd7207e57b70722cf825: VFIO
 PCI passthrough no longer works

On 12/10/24 16:43, Sean Christopherson wrote:
> On Tue, Dec 10, 2024, Tom Lendacky wrote:
>> On 12/10/24 14:33, Simon Pilkington wrote:
>>> On 10/12/2024 16:47, Sean Christopherson wrote:
>>>> Can you run with the below to see what bits the guest is trying to set (or clear)?
>>>> We could get the same info via tracepoints, but this will likely be faster/easier.
>>>>
>>>> ---
>>>>  arch/x86/kvm/svm/svm.c | 12 +++++++++---
>>>>  1 file changed, 9 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>>>> index dd15cc635655..5144d0283c9d 100644
>>>> --- a/arch/x86/kvm/svm/svm.c
>>>> +++ b/arch/x86/kvm/svm/svm.c
>>>> @@ -3195,11 +3195,14 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
>>>>  	case MSR_AMD64_DE_CFG: {
>>>>  		u64 supported_de_cfg;
>>>>  
>>>> -		if (svm_get_feature_msr(ecx, &supported_de_cfg))
>>>> +		if (WARN_ON_ONCE(svm_get_feature_msr(ecx, &supported_de_cfg)))
>>>>  			return 1;
>>>>  
>>>> -		if (data & ~supported_de_cfg)
>>>> +		if (data & ~supported_de_cfg) {
>>>> +			pr_warn("DE_CFG supported = %llx, WRMSR = %llx\n",
>>>> +				supported_de_cfg, data);
>>>>  			return 1;
>>>> +		}
>>>>  
>>>>  		/*
>>>>  		 * Don't let the guest change the host-programmed value.  The
>>>> @@ -3207,8 +3210,11 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
>>>>  		 * are completely unknown to KVM, and the one bit known to KVM
>>>>  		 * is simply a reflection of hardware capabilities.
>>>>  		 */
>>>> -		if (!msr->host_initiated && data != svm->msr_decfg)
>>>> +		if (!msr->host_initiated && data != svm->msr_decfg) {
>>>> +			pr_warn("DE_CFG current = %llx, WRMSR = %llx\n",
>>>> +				svm->msr_decfg, data);
>>>>  			return 1;
>>>> +		}
>>>>  
>>>>  		svm->msr_decfg = data;
>>>>  		break;
>>>>
>>>> base-commit: fac04efc5c793dccbd07e2d59af9f90b7fc0dca4
>>>
>>> Relevant dmesg output with some context below. VM locked up as expected.
>>>
>>> [   85.834971] vfio-pci 0000:0c:00.0: resetting
>>> [   85.937573] vfio-pci 0000:0c:00.0: reset done
>>> [   86.494210] vfio-pci 0000:0c:00.0: resetting
>>> [   86.494264] vfio-pci 0000:0c:00.1: resetting
>>> [   86.761442] vfio-pci 0000:0c:00.0: reset done
>>> [   86.761480] vfio-pci 0000:0c:00.1: reset done
>>> [   86.762392] vfio-pci 0000:0c:00.0: resetting
>>> [   86.865462] vfio-pci 0000:0c:00.0: reset done
>>> [   86.977360] virbr0: port 1(vnet1) entered learning state
>>> [   88.993052] virbr0: port 1(vnet1) entered forwarding state
>>> [   88.993057] virbr0: topology change detected, propagating
>>> [  103.459114] kvm_amd: DE_CFG current = 0, WRMSR = 2
>>> [  161.442032] virbr0: port 1(vnet1) entered disabled state // VM shut down
>>
>> That is the MSR_AMD64_DE_CFG_LFENCE_SERIALIZE bit. Yeah, that actually
>> does change the behavior of LFENCE and isn't just a reflection of the
>> hardware.
>>
>> Linux does set that bit on boot, too (if LFENCE always serializing isn't
>> advertised 8000_0021_EAX[2]), so I'm kind of surprised it didn't pop up
>> there.
> 
> Linux may be running afoul of this, but it would only become visible if someone
> checked dmesg.  Even the "unsafe" MSR accesses in Linux gracefully handle faults
> these days, the only symptom would be a WARN.
> 
>> I imagine that the above CPUID bit isn't set, so an attempt is made to
>> set the MSR bit.
> 
> Yep.  And LFENCE_RDTSC _is_ supported, otherwise the supported_de_cfg check would
> have failed.  Which means it's a-ok for the guest to set the bit, i.e. KVM won't
> let the guest incorrectly think it's running on CPU for which LFENCE is serializing.
> 
> Unless you (Tom) disagree, I vote to simply drop the offending code, i.e. make
> all supported bits fully writable from the guest.  KVM is firmly in the wrong here,
> and I can't think of any reason to disallow the guest from clearing LFENCE_SERIALIZE.
> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 6a350cee2f6c..5a82ead3bf0f 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -3201,15 +3201,6 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
>                 if (data & ~supported_de_cfg)
>                         return 1;
>  
> -               /*
> -                * Don't let the guest change the host-programmed value.  The
> -                * MSR is very model specific, i.e. contains multiple bits that
> -                * are completely unknown to KVM, and the one bit known to KVM
> -                * is simply a reflection of hardware capabilities.
> -                */
> -               if (!msr->host_initiated && data != svm->msr_decfg)
> -                       return 1;
> -

That works for me.

Thanks,
Tom

>                 svm->msr_decfg = data;
>                 break;
>         }
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ