[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrV+mchcQ5SgBO8+xLvw9rrHm4yvq+wZtQsD1EMMGzAMEw@mail.gmail.com>
Date: Mon, 5 Jan 2015 17:01:46 -0800
From: Andy Lutomirski <luto@...capital.net>
To: "Luck, Tony" <tony.luck@...el.com>
Cc: Borislav Petkov <bp@...en8.de>,
Paul McKenney <paulmck@...ux.vnet.ibm.com>,
X86 ML <x86@...nel.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Peter Zijlstra <peterz@...radead.org>,
Oleg Nesterov <oleg@...hat.com>,
Andi Kleen <andi@...stfloor.org>,
Josh Triplett <josh@...htriplett.org>,
Frédéric Weisbecker <fweisbec@...il.com>
Subject: Re: [PATCH] x86, mce: Get rid of TIF_MCE_NOTIFY and associated mce tricks
On Mon, Jan 5, 2015 at 4:44 PM, Luck, Tony <tony.luck@...el.com> wrote:
> We now switch to the kernel stack when a machine check interrupts
> during user mode. This means that we can perform recovery actions
> in the tail of do_machine_check()
>
> Signed-off-by: Tony Luck <tony.luck@...el.com>
>
> ---
> On top of Andy's x86/paranoid branch
> Andy: Should I really move that:
> pr_err("Uncorrected hardware memory error ...
> inside the ist_begin_non_atomic() section?
>
I think I like it as is.
[...]
> @@ -1220,6 +1177,26 @@ void do_machine_check(struct pt_regs *regs, long error_code)
> mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
> out:
> sync_core();
> +
> + if (recover_paddr == ~0ull)
> + goto done;
> +
> + pr_err("Uncorrected hardware memory error in user-access at %llx",
> + recover_paddr);
printk is safe from IRQ context, so this should be okay unless we've
totally screwed up. And, if we totally screwed up, seeing this before
the BUGs in ist_begin_non_atomic would be nice.
> + /*
> + * We must call memory_failure() here even if the current process is
> + * doomed. We still need to mark the page as poisoned and alert any
> + * other users of the page.
> + */
> + ist_begin_non_atomic(regs);
> + local_irq_enable();
> + if (memory_failure(recover_paddr >> PAGE_SHIFT, MCE_VECTOR, flags) < 0) {
> + pr_err("Memory error not recovered");
> + force_sig(SIGBUS, current);
> + }
> + local_irq_disable();
> + ist_end_non_atomic();
> +done:
> ist_exit(regs, prev_state);
> }
For the context-related bits:
Reviewed-by: Andy Lutomirski <luto@...capital.net>
Should I stick this in my -next branch so it can stew?
--Andy
--
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists