linux-kernel - Re: [PATCH] x86/fault: Send SIGBUS to user process always for hwpoison page access.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20210128174352.GA33283@agluck-desk2.amr.corp.intel.com>
Date:   Thu, 28 Jan 2021 09:43:52 -0800
From:   "Luck, Tony" <tony.luck@...el.com>
To:     Aili Yao <yaoaili@...gsoft.com>
Cc:     x86@...nel.org, naoya.horiguchi@....com,
        linux-kernel@...r.kernel.org, yangfeng1@...gsoft.com
Subject: Re: [PATCH] x86/fault: Send SIGBUS to user process always for
 hwpoison page access.

On Thu, Jan 28, 2021 at 07:43:26PM +0800, Aili Yao wrote:
> when one page is already hwpoisoned by AO action, process may not be
> killed, the process mapping this page may make a syscall include this
> page and result to trigger a VM_FAULT_HWPOISON fault, as it's in kernel
> mode it may be fixed by fixup_exception, current code will just return
> error code to user process.

Shouldn't the AO action that poisoned the page have also unmapped it?
> 
> This is not suffient, we should send a SIGBUS to the process and log the
> info to console, as we can't trust the process will handle the error
> correctly.

I agree with this part ... few apps check for -EFAULT and do the right
thing.  But I'm not sure how this happens. Can you provide a bit more
detail on the steps

-Tony

P.S. Typo: s/suffient/sufficient/

> 
> Suggested-by: Feng Yang <yangfeng1@...gsoft.com>
> Signed-off-by: Aili Yao <yaoaili@...gsoft.com>
> ---
>  arch/x86/mm/fault.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index f1f1b5a0956a..36d1e385512b 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -662,7 +662,16 @@ no_context(struct pt_regs *regs, unsigned long error_code,
>  		 * In this case we need to make sure we're not recursively
>  		 * faulting through the emulate_vsyscall() logic.
>  		 */
> +#ifdef CONFIG_MEMORY_FAILURE
> +		if (si_code == BUS_MCEERR_AR && signal == SIGBUS)
> +			pr_err("MCE: Killing %s:%d due to hardware memory corruption fault at %lx\n",
> +				current->comm, current->pid, address);
> +
> +		if ((current->thread.sig_on_uaccess_err && signal) ||
> +			(si_code == BUS_MCEERR_AR && signal == SIGBUS)) {
> +#else
>  		if (current->thread.sig_on_uaccess_err && signal) {
> +#endif
>  			sanitize_error_code(address, &error_code);
>  
>  			set_signal_archinfo(address, error_code);
> @@ -927,7 +936,14 @@ do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long address,
>  {
>  	/* Kernel mode? Handle exceptions or die: */
>  	if (!(error_code & X86_PF_USER)) {
> +#ifdef CONFIG_MEMORY_FAILURE
> +		if (fault & (VM_FAULT_HWPOISON|VM_FAULT_HWPOISON_LARGE))
> +			no_context(regs, error_code, address, SIGBUS, BUS_MCEERR_AR);
> +		else
> +			no_context(regs, error_code, address, SIGBUS, BUS_ADRERR);
> +#else
>  		no_context(regs, error_code, address, SIGBUS, BUS_ADRERR);
> +#endif
>  		return;
>  	}
>  
> -- 
> 2.25.1
>