[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250218141535.GC34567@noisy.programming.kicks-ass.net>
Date: Tue, 18 Feb 2025 15:15:35 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Shuai Xue <xueshuai@...ux.alibaba.com>
Cc: tony.luck@...el.com, bp@...en8.de, nao.horiguchi@...il.com,
tglx@...utronix.de, mingo@...hat.com, dave.hansen@...ux.intel.com,
x86@...nel.org, hpa@...or.com, linmiaohe@...wei.com,
akpm@...ux-foundation.org, jpoimboe@...nel.org,
linux-edac@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-mm@...ck.org, baolin.wang@...ux.alibaba.com,
tianruidong@...ux.alibaba.com
Subject: Re: [PATCH v2 3/5] x86/mce: add EX_TYPE_EFAULT_REG as in-kernel
recovery context to fix copy-from-user operations regression
On Tue, Feb 18, 2025 at 09:28:33PM +0800, Shuai Xue wrote:
> I did build and test this patch set on it. But I did not find any warnings.
> Could you provide more details?
NOINSTR_VALIDATION=y helps
> > > /* Allow instrumentation around external facilities usage. */
> > > instrumentation_begin();
> > > - fixup_type = ex_get_fixup_type(m->ip);
> > > + fixup_type = FIELD_GET(EX_DATA_TYPE_MASK, e->data);
> > > + imm = FIELD_GET(EX_DATA_IMM_MASK, e->data);
> > > copy_user = is_copy_from_user(regs);
> > > instrumentation_end();
> > > @@ -304,9 +311,13 @@ static noinstr int error_context(struct mce *m, struct pt_regs *regs)
> > > case EX_TYPE_UACCESS:
> > > if (!copy_user)
> > > return IN_KERNEL;
> > > - m->kflags |= MCE_IN_KERNEL_COPYIN;
> > > - fallthrough;
> > > -
> > > + m->kflags |= MCE_IN_KERNEL_COPYIN | MCE_IN_KERNEL_RECOV;
> > > + return IN_KERNEL_RECOV;
> > > + case EX_TYPE_IMM_REG:
> > > + if (!copy_user || imm != -EFAULT)
> > > + return IN_KERNEL;
> > > + m->kflags |= MCE_IN_KERNEL_COPYIN | MCE_IN_KERNEL_RECOV;
> > > + return IN_KERNEL_RECOV;
> >
> > Maybe I'm justnot understanding things, but what's wrong with something
> > like the below; why do we care about the ex-type if we know its a MOV
> > reading from userspace?
> >
> > The less we muck about with the extable here, the better.
>
> We need to make sure that we have register a fixup handler for the copy_user
> case. If no fixup handler found, the PC accessing posion will trigger #MCE
> again and again resulting a hardlock up.
Well, then write it like so. Afaict, you don't care what the actual
exception type is, just that there is one, for the copy_user case.
diff --git a/arch/x86/kernel/cpu/mce/severity.c b/arch/x86/kernel/cpu/mce/severity.c
index dac4d64dfb2a..cfdae25eacd7 100644
--- a/arch/x86/kernel/cpu/mce/severity.c
+++ b/arch/x86/kernel/cpu/mce/severity.c
@@ -301,18 +301,19 @@ static noinstr int error_context(struct mce *m, struct pt_regs *regs)
instrumentation_end();
switch (fixup_type) {
- case EX_TYPE_UACCESS:
- if (!copy_user)
- return IN_KERNEL;
- m->kflags |= MCE_IN_KERNEL_COPYIN;
- fallthrough;
-
case EX_TYPE_FAULT_MCE_SAFE:
case EX_TYPE_DEFAULT_MCE_SAFE:
m->kflags |= MCE_IN_KERNEL_RECOV;
return IN_KERNEL_RECOV;
default:
+ if (copy_user) {
+ m->kflags |= MCE_IN_KERNEL_COPYIN | MCE_IN_KERNEL_RECOV;
+ return IN_KERNEL_RECOV;
+ }
+ fallthrough;
+
+ case EX_TYPE_NONE:
return IN_KERNEL;
}
}
Powered by blists - more mailing lists