linux-kernel - Re: [PATCH 7/8] x86/mce: Recover from poison found while copying from user space

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20200918161347.GG6585@zn.tnic>
Date:   Fri, 18 Sep 2020 18:13:47 +0200
From:   Borislav Petkov <bp@...en8.de>
To:     Tony Luck <tony.luck@...el.com>
Cc:     Youquan Song <youquan.song@...el.com>, x86@...nel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 7/8] x86/mce: Recover from poison found while copying
 from user space

On Tue, Sep 08, 2020 at 10:55:18AM -0700, Tony Luck wrote:
> From: Youquan Song <youquan.song@...el.com>
> 
> Existing kernel code can only recover from a machine check on code that
> tagged in the exception table with a fault handling recovery path.

"is tagged"

> New field in the task structure mce_vaddr is initialized to the
> user virtual address of the fault. This is so that kill_me_maybe()
> can provide that information to the user SIGBUS handler.
> 
> Add code to recover from a machine check while copying data from user
> space to the kernel. Action for this case is the same as if the user
> touched the poison directly; unmap the page and send a SIGBUS to the task.
> 
> Signed-off-by: Youquan Song <youquan.song@...el.com>
> Signed-off-by: Tony Luck <tony.luck@...el.com>
> ---
>  arch/x86/kernel/cpu/mce/core.c | 51 ++++++++++++++++++++++++++++++++++
>  include/linux/sched.h          |  1 +
>  2 files changed, 52 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index 5512318a07ae..2a3c42329c3f 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -53,6 +53,8 @@
>  #include <asm/mce.h>
>  #include <asm/msr.h>
>  #include <asm/reboot.h>
> +#include <asm/insn.h>
> +#include <asm/insn-eval.h>
>  
>  #include "internal.h"
>  
> @@ -1197,6 +1199,32 @@ static void kill_me_maybe(struct callback_head *cb)
>  	kill_me_now(cb);
>  }
>  
> +/*
> + * Decode a kernel instruction that faulted while reading from a user
> + * address and return the linear address that was being read.
> + */
> +static void __user *get_virtual_address(struct pt_regs *regs)
> +{
> +	u8 insn_buf[MAX_INSN_SIZE];
> +	struct insn insn;
> +
> +	if (copy_from_kernel_nofault(insn_buf, (void *)regs->ip, MAX_INSN_SIZE))
> +		return (void __user *)~0ul;

You're initializing ->mce_vaddr to NULL below but you're returning ~0
here. You should return NULL here too. If it is NULL, this check from
your next patch will pass:

	if (p->mce_vaddr != (void __user *)~0ul) {

which would be the wrong thing to do so you need to think about a single
invalid vaddr value and stick with it.

> +	kernel_insn_init(&insn, insn_buf, MAX_INSN_SIZE);
> +	insn_get_length(&insn);
> +	insn_get_modrm(&insn);
> +	insn_get_sib(&insn);

AFAICT, you need the opcode only so why do all those?

I think you simply need to do:

	insn_get_opcode()

and then check opcode->got because otherwise you might be looking at
garbage below.

> +
> +	/*
> +	 * For MOVS[BWLQ] the source address is in %rsi

Pls end your sentences with a fullstop.

> +	 */
> +	if (insn.opcode.value == 0xa4 || insn.opcode.value == 0xa5)
> +		return (void __user *)regs->si;

How do you know just by looking at the opcodes, that the source operand
in rSI is __user memory?

I see is_copy_from_user() in your next patch so I guess I'll verify that
there...

> +	else
> +		return insn_get_addr_ref(&insn, regs);
> +}
> +
>  /*
>   * The actual machine check handler. This only handles real
>   * exceptions when something got corrupted coming in through int 18.
> @@ -1342,6 +1370,7 @@ noinstr void do_machine_check(struct pt_regs *regs)
>  		/* If this triggers there is no way to recover. Die hard. */
>  		BUG_ON(!on_thread_stack() || !user_mode(regs));
>  
> +		current->mce_vaddr = NULL;
>  		current->mce_addr = m.addr;
>  		current->mce_ripv = !!(m.mcgstatus & MCG_STATUS_RIPV);
>  		current->mce_whole_page = whole_page(&m);
> @@ -1350,6 +1379,13 @@ noinstr void do_machine_check(struct pt_regs *regs)
>  			current->mce_kill_me.func = kill_me_now;
>  		task_work_add(current, &current->mce_kill_me, true);
>  	} else {
> +		/*
> +		 * Before fixing the exception IP, find the user address
> +		 * in the MCE_IN_KERNEL_COPYIN case
						   ^
						   |-- Fullstop

> +		 */
> +		if (m.kflags & MCE_IN_KERNEL_COPYIN)
> +			current->mce_vaddr = get_virtual_address(regs);
> +
>  		/*
>  		 * Handle an MCE which has happened in kernel space but from
>  		 * which the kernel can recover: ex_has_fault_handler() has
> @@ -1363,6 +1399,21 @@ noinstr void do_machine_check(struct pt_regs *regs)
>  			if (!fixup_exception(regs, X86_TRAP_MC, 0, 0))
>  				mce_panic("Failed kernel mode recovery", &m, msg);
>  		}
> +
> +		/*
> +		 * MCE on user data while copying to kernel. Action here is
> +		 * very similar to the user hitting the poison themself.
> +		 * Poison page will be unmapped and signal sent to process.
> +		 */
> +		if (m.kflags & MCE_IN_KERNEL_COPYIN) {
> +			current->mce_addr = m.addr;
> +			current->mce_ripv = !!(m.mcgstatus & MCG_STATUS_RIPV);
> +			current->mce_whole_page = whole_page(&m);
> +			current->mce_kill_me.func = kill_me_maybe;
> +			if (kill_it)
> +				current->mce_kill_me.func = kill_me_now;
> +			task_work_add(current, &current->mce_kill_me, true);

This hunk is mostly copied from the in-user case above. How about a
"goto recover;" label instead of the duplication?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette