linux-kernel - Re: [PATCH v10 13/28] x86/fpu/xstate: Use feature disable (XFD) to protect dynamic user state

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <47D6E3AB-A3B6-4604-89A4-EBEF1F3AB026@intel.com>
Date:   Sun, 3 Oct 2021 22:41:45 +0000
From:   "Bae, Chang Seok" <chang.seok.bae@...el.com>
To:     Thomas Gleixner <tglx@...utronix.de>
CC:     "bp@...e.de" <bp@...e.de>, "Lutomirski, Andy" <luto@...nel.org>,
        "mingo@...nel.org" <mingo@...nel.org>,
        "x86@...nel.org" <x86@...nel.org>,
        "Brown, Len" <len.brown@...el.com>,
        "lenb@...nel.org" <lenb@...nel.org>,
        "Hansen, Dave" <dave.hansen@...el.com>,
        "Macieira, Thiago" <thiago.macieira@...el.com>,
        "Liu, Jing2" <jing2.liu@...el.com>,
        "Shankar, Ravi V" <ravi.v.shankar@...el.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v10 13/28] x86/fpu/xstate: Use feature disable (XFD) to
 protect dynamic user state

On Oct 1, 2021, at 08:02, Thomas Gleixner <tglx@...utronix.de> wrote:
> On Wed, Aug 25 2021 at 08:53, Chang S. Bae wrote:
>> +/**
>> + * xfd_switch - Switches the MSR IA32_XFD context if needed.
>> + * @prev:	The previous task's struct fpu pointer
>> + * @next:	The next task's struct fpu pointer
>> + */
>> +static inline void xfd_switch(struct fpu *prev, struct fpu *next)
>> +{
>> +	u64 prev_xfd_mask, next_xfd_mask;
>> +
>> +	if (!cpu_feature_enabled(X86_FEATURE_XFD) || !xfeatures_mask_user_dynamic)
>> +		return;
> 
> This is context switch, so this wants to be a static key which is turned
> on during init when the CPU supports XFD and user dynamic features are
> available.

Replied in the later email [1].

>> +
>> +	prev_xfd_mask = prev->state_mask & xfeatures_mask_user_dynamic;
>> +	next_xfd_mask = next->state_mask & xfeatures_mask_user_dynamic;
>> +
>> +	if (unlikely(prev_xfd_mask != next_xfd_mask))
>> +		wrmsrl_safe(MSR_IA32_XFD, xfeatures_mask_user_dynamic ^ next_xfd_mask);
>> +}
>> +
>> /*
>>  * Delay loading of the complete FPU state until the return to userland.
>>  * PKRU is handled separately.
>>  */
>> -static inline void switch_fpu_finish(struct fpu *new_fpu)
>> +static inline void switch_fpu_finish(struct fpu *old_fpu, struct fpu *new_fpu)
>> {
>> -	if (cpu_feature_enabled(X86_FEATURE_FPU))
>> +	if (cpu_feature_enabled(X86_FEATURE_FPU)) {
>> 		set_thread_flag(TIF_NEED_FPU_LOAD);
>> +		xfd_switch(old_fpu, new_fpu);
> 
> Why has this to be done on context switch? Zero explanation provided.
> 
> Why can't this be done in exit_to_user() where the FPU state restore is
> handled?

Replied in the later email [1].

>> 	}
>> +
>> +	if (boot_cpu_has(X86_FEATURE_XFD))
> 
> s/boot_cpu_has/cpu_feature_enabled/g

I think this is under fpu__init_cpu_xstate(). IIRC, here cpu_feature_enabled()
had caused a build error before. Now it looks okay. Will update.

>> +		wrmsrl(MSR_IA32_XFD, xfeatures_mask_user_dynamic);
>> }
>> +
>> +	if (cpu_feature_enabled(X86_FEATURE_XFD))
>> +		wrmsrl_safe(MSR_IA32_XFD, (current->thread.fpu.state_mask &
>> +					   xfeatures_mask_user_dynamic) ^
>> +					  xfeatures_mask_user_dynamic);
> 
> Lacks curly braces as it's not a single line of code.

Sorry, I was confused with other examples like this in the mainline. Will fix.

>> }
>> 
>> /**
>> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
>> index 33f5d8d07367..6cd4fb098f8f 100644
>> --- a/arch/x86/kernel/process.c
>> +++ b/arch/x86/kernel/process.c
>> @@ -97,6 +97,16 @@ void arch_thread_struct_whitelist(unsigned long *offset, unsigned long *size)
>> 	*size = fpu_buf_cfg.min_size;
>> }
>> 
>> +void arch_release_task_struct(struct task_struct *task)
>> +{
>> +	if (!cpu_feature_enabled(X86_FEATURE_FPU))
>> +		return;
>> +
>> +	/* Free up only the dynamically-allocated memory. */
>> +	if (task->thread.fpu.state != &task->thread.fpu.__default_state)
> 
> Sigh.

Yeah, I will fix it this time. I also responded about the reason for doing
this in the other mail [2].

>> +		free_xstate_buffer(&task->thread.fpu);
>> 
>> +static __always_inline bool handle_xfd_event(struct fpu *fpu, struct pt_regs *regs)
>> +{
>> +	bool handled = false;
>> +	u64 xfd_err;
>> +
>> +	if (!cpu_feature_enabled(X86_FEATURE_XFD))
>> +		return handled;
>> +
>> +	rdmsrl_safe(MSR_IA32_XFD_ERR, &xfd_err);
>> +	wrmsrl_safe(MSR_IA32_XFD_ERR, 0);
>> +
>> +	if (xfd_err) {
> 
> What's wrong with
> 
>       if (!xfd_err)
>       		return false;
> 
> an spare the full indentation levels below

I thought local variables under this. But yes, this can save an indentation
level here.

>> +		u64 xfd_event = xfd_err & xfeatures_mask_user_dynamic;
>> +		u64 value;
>> +
>> +		if (WARN_ON(!xfd_event)) {
>> +			/*
>> +			 * Unexpected event is raised. But update XFD state to
>> +			 * unblock the task.
>> +			 */
>> +			rdmsrl_safe(MSR_IA32_XFD, &value);
>> +			wrmsrl_safe(MSR_IA32_XFD, value & ~xfd_err);
> 
> Ditto. But returning false here will not unblock the task as
> exc_device_not_available() will simply reach "die()".

Yes, it is. But this "unexpected #NM exception” could make confusion as an #NM
is XFD-induced and that needs to be differentiated for users. (Len made this
point to me.)

>> +		} else {
>> +			struct fpu *fpu = &current->thread.fpu;
> 
> You need this because the fpu argument above is invalid?

Ah, so sorry, I should have removed this line when I refactor this function..

>> +			int err = -1;
>> +
>> +			/*
>> +			 * Make sure not in interrupt context as handling a
>> +			 * trap from userspace.
>> +			 */
>> +			if (!WARN_ON(in_interrupt())) {
> 
> Why would in_interrupt() be necessarily true when the trap comes from
> kernel space? The proper check is user_mode(regs) as done anywhere else.

I see. 

>> +				err = realloc_xstate_buffer(fpu, xfd_event);
>> +				if (!err)
>> +					wrmsrl_safe(MSR_IA32_XFD, (fpu->state_mask &
>> +								   xfeatures_mask_user_dynamic) ^
>> +								  xfeatures_mask_user_dynamic);
>> +			}
>> +
>> +			/* Raise a signal when it failed to handle. */
>> +			if (err)
>> +				force_sig_fault(SIGILL, ILL_ILLOPC, error_get_trap_addr(regs));
>> +		}
>> +		handled = true;
>> +	}
>> +	return handled;
>> +}
>> +
>> DEFINE_IDTENTRY(exc_device_not_available)
>> {
>> 	unsigned long cr0 = read_cr0();
> 
>> +	if (handle_xfd_event(&current->thread.fpu, regs))
>> +		return;
> 
> As I said before, this is wrong because at that point interrupts are disabled.

I saw you suggested the code. Will take that, thanks.

Thanks,
Chang

[1] https://lore.kernel.org/lkml/66A19E8A-11BF-4532-878F-A8D0935FDBC7@intel.com/
[2] https://lore.kernel.org/lkml/CAF9A956-5623-4D24-BA3E-AF139C0A7CE6@intel.com/