lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bc7fa0c4-d20e-713a-da4c-c65ec78080da@intel.com>
Date:   Fri, 27 Oct 2023 23:45:17 +0800
From:   "Yang, Weijiang" <weijiang.yang@...el.com>
To:     Sean Christopherson <seanjc@...gle.com>
CC:     Dave Hansen <dave.hansen@...el.com>, <pbonzini@...hat.com>,
        <kvm@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
        <peterz@...radead.org>, <chao.gao@...el.com>,
        <rick.p.edgecombe@...el.com>, <john.allen@....com>
Subject: Re: [PATCH v6 06/25] x86/fpu/xstate: Opt-in kernel dynamic bits when
 calculate guest xstate size

On 10/27/2023 1:24 AM, Sean Christopherson wrote:
> On Wed, Oct 25, 2023, Weijiang Yang wrote:
>> On 10/25/2023 1:07 AM, Sean Christopherson wrote:
>>> On Fri, Sep 15, 2023, Weijiang Yang wrote:
>>> IIUC, the "dynamic" features contains CET_KERNEL, whereas xfeatures_mask_supervisor()
>>> conatins PASID, CET_USER, and CET_KERNEL.  PASID isn't virtualized by KVM, but
>>> doesn't that mean CET_USER will get dropped/lost if userspace requests AMX/XTILE
>>> enabling?
>> Yes, __state_size is correct for guest enabled xfeatures, including CET_USER,
>> and it gets removed from __state_perm.
>>
>> IIUC, from current qemu/kernel interaction for guest permission settings,
>> __xstate_request_perm() is called only _ONCE_ to set AMX/XTILE for every vCPU
>> thread, so the removal of guest supervisor xfeatures won't hurt guest! ;-/
> Huh?  I don't follow.  What does calling __xstate_request_perm() only once have
> to do with anything?
>
> /me stares more
>
> OMG, hell no.  First off, this code is a nightmare to follow.  The existing comment
> is useless.  No shit the code is adding in supervisor states for the host.  What's
> not AT ALL clear is *why*.
>
> The commit says it's necessary because the "permission bitmap is only relevant
> for user states":
>
>    commit 781c64bfcb735960717d1cb45428047ff6a5030c
>    Author: Thomas Gleixner <tglx@...utronix.de>
>    Date:   Thu Mar 24 14:47:14 2022 +0100
>
>      x86/fpu/xstate: Handle supervisor states in XSTATE permissions
>      
>      The size calculation in __xstate_request_perm() fails to take supervisor
>      states into account because the permission bitmap is only relevant for user
>      states.
>
> But @permitted comes from:
>
>    permitted = xstate_get_group_perm(guest);
>
> which is either fpu->guest_perm.__state_perm or fpu->perm.__state_perm.  And
> __state_perm is initialized to:
>
> 	fpu->perm.__state_perm		= fpu_kernel_cfg.default_features;
>
> where fpu_kernel_cfg.default_features contains everything except the dynamic
> xfeatures, i.e. everything except XFEATURE_MASK_XTILE_DATA:
>
> 	fpu_kernel_cfg.default_features = fpu_kernel_cfg.max_features;
> 	fpu_kernel_cfg.default_features &= ~XFEATURE_MASK_USER_DYNAMIC;
>
> So why on earth does this code to force back xfeatures_mask_supervisor()?  Because
> the code just below drops the supervisor bits to compute the user xstate size and
> then clobbers __state_perm.
>
> 	/* Calculate the resulting user state size */
> 	mask &= XFEATURE_MASK_USER_SUPPORTED;
> 	usize = xstate_calculate_size(mask, false);
>
> 	...
>
> 	WRITE_ONCE(perm->__state_perm, mask);
>
> That is beyond asinine.  IIUC, the intent is to apply the permission bitmap only
> for user states, because the only dynamic states are user states.  Bbut the above
> creates an inconsistent mess.  If userspace doesn't request XTILE_DATA,
> __state_perm will contain supervisor states, but once userspace does request
> XTILE_DATA, __state_perm will be lost.

Exactly, thanks for calling it out!

> And because that's not confusing enough, clobbering __state_perm would also drop
> FPU_GUEST_PERM_LOCKED, except that __xstate_request_perm() can' be reached with
> said LOCKED flag set.
>
> fpu_xstate_prctl() already strips out supervisor features:
>
> 	case ARCH_GET_XCOMP_PERM:
> 		/*
> 		 * Lockless snapshot as it can also change right after the
> 		 * dropping the lock.
> 		 */
> 		permitted = xstate_get_host_group_perm();
> 		permitted &= XFEATURE_MASK_USER_SUPPORTED;
> 		return put_user(permitted, uptr);
>
> 	case ARCH_GET_XCOMP_GUEST_PERM:
> 		permitted = xstate_get_guest_group_perm();
> 		permitted &= XFEATURE_MASK_USER_SUPPORTED;
> 		return put_user(permitted, uptr);
>
> and while KVM doesn't apply the __state_perm to supervisor states, if it did
> there would be zero harm in doing so.
>
> 	case 0xd: {
> 		u64 permitted_xcr0 = kvm_get_filtered_xcr0();
> 		u64 permitted_xss = kvm_caps.supported_xss;
>
> Second, the relying on QEMU to only trigger __xstate_request_perm() is not acceptable.
> It "works" for the current code, but only because there's only a single dynamic
> feature, i.e. this will short circuit and prevent computing a bad ksize.
>
> 	/* Check whether fully enabled */
> 	if ((permitted & requested) == requested)
> 		return 0;
>
> I don't know how I can possibly make it any clearer: KVM absolutely must not assume
> userspace behavior.
>
> So rather than continue with the current madness, which will break if/when the
> next dynamic feature comes along, just preserve non-user xfeatures/flags in
> __guest_perm.

Yes, it's time to rectify the confusion and make permission based settings clearer.
Below patch looks good to me, thanks!

> If there are no objections, I'll test the below and write a proper changelog.
>   
> --
> From: Sean Christopherson <seanjc@...gle.com>
> Date: Thu, 26 Oct 2023 10:17:33 -0700
> Subject: [PATCH] x86/fpu/xstate: Always preserve non-user xfeatures/flags in
>   __state_perm
>
> Fixes: 781c64bfcb73 ("x86/fpu/xstate: Handle supervisor states in XSTATE permissions")
> Signed-off-by: Sean Christopherson <seanjc@...gle.com>
> ---
>   arch/x86/kernel/fpu/xstate.c | 18 +++++++++++-------
>   1 file changed, 11 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
> index ef6906107c54..73f6bc00d178 100644
> --- a/arch/x86/kernel/fpu/xstate.c
> +++ b/arch/x86/kernel/fpu/xstate.c
> @@ -1601,16 +1601,20 @@ static int __xstate_request_perm(u64 permitted, u64 requested, bool guest)
>   	if ((permitted & requested) == requested)
>   		return 0;
>   
> -	/* Calculate the resulting kernel state size */
> +	/*
> +	 * Calculate the resulting kernel state size.  Note, @permitted also
> +	 * contains supervisor xfeatures even though supervisor are always
> +	 * permitted for kernel and guest FPUs, and never permitted for user
> +	 * FPUs.
> +	 */
>   	mask = permitted | requested;
> -	/* Take supervisor states into account on the host */
> -	if (!guest)
> -		mask |= xfeatures_mask_supervisor();
>   	ksize = xstate_calculate_size(mask, compacted);
>   
> -	/* Calculate the resulting user state size */
> -	mask &= XFEATURE_MASK_USER_SUPPORTED;
> -	usize = xstate_calculate_size(mask, false);
> +	/*
> +	 * Calculate the resulting user state size.  Take care not to clobber
> +	 * the supervisor xfeatures in the new mask!
> +	 */
> +	usize = xstate_calculate_size(mask & XFEATURE_MASK_USER_SUPPORTED, false);
>   
>   	if (!guest) {
>   		ret = validate_sigaltstack(usize);
>
> base-commit: c076acf10c78c0d7e1aa50670e9cc4c91e8d59b4

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ