[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210205130616.6d876846@alex-virtual-machine>
Date: Fri, 5 Feb 2021 13:06:16 +0800
From: Aili Yao <yaoaili@...gsoft.com>
To: "HORIGUCHI NAOYA堀口 直也)"
<naoya.horiguchi@....com>
CC: "tony.luck@...el.com" <tony.luck@...el.com>,
"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
"luto@...nel.org" <luto@...nel.org>,
"peterz@...radead.org" <peterz@...radead.org>,
"tglx@...utronix.de" <tglx@...utronix.de>,
"mingo@...hat.com" <mingo@...hat.com>,
"bp@...en8.de" <bp@...en8.de>, "hpa@...or.com" <hpa@...or.com>,
"x86@...nel.org" <x86@...nel.org>,
"YANGFENG1@...gsoft.com" <YANGFENG1@...gsoft.com>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
<yaoaili@...gsoft.com>
Subject: Re: [PATCH v2] x86/fault: Send a SIGBUS to user process always for
hwpoison page access.
On Thu, 4 Feb 2021 07:25:55 +0000
HORIGUCHI NAOYA(堀口 直也) <naoya.horiguchi@....com> wrote:
> Hi Aili,
>
> On Mon, Feb 01, 2021 at 04:17:49PM +0800, Aili Yao wrote:
> > When one page is already hwpoisoned by AO action, process may not be
> > killed, the process mapping this page may make a syscall include this
> > page and result to trigger a VM_FAULT_HWPOISON fault, if it's in kernel
> > mode it may be fixed by fixup_exception. Current code will just return
> > error code to user process.
> >
> > This is not sufficient, we should send a SIGBUS to the process and log
> > the info to console, as we can't trust the process will handle the error
> > correctly.
> >
> > Suggested-by: Feng Yang <yangfeng1@...gsoft.com>
> > Signed-off-by: Aili Yao <yaoaili@...gsoft.com>
> > ---
> ...
>
> > @@ -662,12 +662,32 @@ no_context(struct pt_regs *regs, unsigned long error_code,
> > * In this case we need to make sure we're not recursively
> > * faulting through the emulate_vsyscall() logic.
> > */
> > +
> > + if (IS_ENABLED(CONFIG_MEMORY_FAILURE) &&
> > + fault & (VM_FAULT_HWPOISON|VM_FAULT_HWPOISON_LARGE)) {
> > + unsigned int lsb = 0;
> > +
> > + pr_err("MCE: Killing %s:%d due to hardware memory corruption fault at %lx\n",
> > + current->comm, current->pid, address);
> > +
> > + sanitize_error_code(address, &error_code);
> > + set_signal_archinfo(address, error_code);
> > +
> > + if (fault & VM_FAULT_HWPOISON_LARGE)
> > + lsb = hstate_index_to_shift(VM_FAULT_GET_HINDEX(fault));
> > + if (fault & VM_FAULT_HWPOISON)
> > + lsb = PAGE_SHIFT;
> > +
> > + force_sig_mceerr(BUS_MCEERR_AR, (void __user *)address, lsb);
>
> This part contains some duplicated code with do_sigbus(), so some refactoring (like
> adding a common function) would be more helpful.
Yes, agree, I will modify this and rebase to the big fault series from tip.
Thanks
Aili Yao
Powered by blists - more mailing lists