[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZD7TBZex278dSYmc@agluck-desk3.sc.intel.com>
Date: Tue, 18 Apr 2023 10:27:33 -0700
From: Tony Luck <tony.luck@...el.com>
To: Yazen Ghannam <yazen.ghannam@....com>
Cc: Borislav Petkov <bp@...en8.de>, x86@...nel.org,
linux-kernel@...r.kernel.org, linux-edac@...r.kernel.org,
patches@...ts.linux.dev
Subject: Re: [PATCH] x86/mce: Check that memory address is usable for recovery
On Tue, Apr 18, 2023 at 12:41:17PM -0400, Yazen Ghannam wrote:
> On 3/21/23 20:51, Tony Luck wrote:
> > uc_decode_notifier() includes a check that "struct mce"
> > contains a valid address for recovery. But the machine
> > check recovery code does not include a similar check.
> >
> > Use mce_usable_address() to check that there is a valid
> > address.
> >
> > Signed-off-by: Tony Luck <tony.luck@...el.com>
> > ---
> > arch/x86/kernel/cpu/mce/core.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> > index 2eec60f50057..fa28b3f7d945 100644
> > --- a/arch/x86/kernel/cpu/mce/core.c
> > +++ b/arch/x86/kernel/cpu/mce/core.c
> > @@ -1533,7 +1533,7 @@ noinstr void do_machine_check(struct pt_regs *regs)
> > /* If this triggers there is no way to recover. Die hard. */
> > BUG_ON(!on_thread_stack() || !user_mode(regs));
> >
> > - if (kill_current_task)
> > + if (kill_current_task || !mce_usable_address(&m))
> > queue_task_work(&m, msg, kill_me_now);
> > else
> > queue_task_work(&m, msg, kill_me_maybe);
>
> I think it should be like this:
>
> if (mce_usable_address(&m))
> queue_task_work(&m, msg, kill_me_maybe);
> else
> queue_task_work(&m, msg, kill_me_now);
>
> A usable address should always go through memory_failure() so that the page is
> marked as poison. If !RIPV, then memory_failure() will get the MF_MUST_KILL
> flag and try to kill all processes after the page is poisoned.
>
> I had a similar patch a while back:
> https://lore.kernel.org/linux-edac/20210504174712.27675-3-Yazen.Ghannam@amd.com/
>
> We could also get rid of kill_me_now() like you had suggested.
Can we also get rid of "kill_current_task"? It is only set when there is
no valid return address:
if (!(m.mcgstatus & MCG_STATUS_RIPV))
kill_current_task = 1;
kill_me_maybe() does an equivalent check:
if (!p->mce_ripv)
flags |= MF_MUST_KILL;
Which only leaves this check to resolve in some way:
if (worst != MCE_AR_SEVERITY && !kill_current_task)
goto out;
-Tony
Powered by blists - more mailing lists