lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Tue, 12 Jan 2021 14:04:55 -0800 From: Andy Lutomirski <luto@...capital.net> To: "Luck, Tony" <tony.luck@...el.com> Cc: Andy Lutomirski <luto@...nel.org>, Borislav Petkov <bp@...en8.de>, X86 ML <x86@...nel.org>, Andrew Morton <akpm@...ux-foundation.org>, Peter Zijlstra <peterz@...radead.org>, Darren Hart <dvhart@...radead.org>, LKML <linux-kernel@...r.kernel.org>, linux-edac <linux-edac@...r.kernel.org>, Linux-MM <linux-mm@...ck.org> Subject: Re: [PATCH v2 1/3] x86/mce: Avoid infinite loop for copy from user recovery > On Jan 12, 2021, at 12:52 PM, Luck, Tony <tony.luck@...el.com> wrote: > > On Tue, Jan 12, 2021 at 10:57:07AM -0800, Andy Lutomirski wrote: >>> On Tue, Jan 12, 2021 at 10:24 AM Luck, Tony <tony.luck@...el.com> wrote: >>> >>> On Tue, Jan 12, 2021 at 09:21:21AM -0800, Andy Lutomirski wrote: >>>> Well, we need to do *something* when the first __get_user() trips the >>>> #MC. It would be nice if we could actually fix up the page tables >>>> inside the #MC handler, but, if we're in a pagefault_disable() context >>>> we might have locks held. Heck, we could have the pagetable lock >>>> held, be inside NMI, etc. Skipping the task_work_add() might actually >>>> make sense if we get a second one. >>>> >>>> We won't actually infinite loop in pagefault_disable() context -- if >>>> we would, then we would also infinite loop just from a regular page >>>> fault, too. >>> >>> Fixing the page tables inside the #MC handler to unmap the poison >>> page would indeed be a good solution. But, as you point out, not possible >>> because of locks. >>> >>> Could we take a more drastic approach? We know that this case the kernel >>> is accessing a user address for the current process. Could the machine >>> check handler just re-write %cr3 to point to a kernel-only page table[1]. >>> I.e. unmap the entire current user process. >> >> That seems scary, especially if we're in the middle of a context >> switch when this happens. We *could* make it work, but I'm not at all >> convinced it's wise. > > Scary? It's terrifying! > > But we know that the fault happend in a get_user() or copy_from_user() call > (i.e. an RIP with an extable recovery address). Does context switch > access user memory? No, but NMI can. The case that would be very very hard to deal with is if we get an NMI just before IRET/SYSRET and get #MC inside that NMI. What we should probably do is have a percpu list of pending memory failure cleanups and just accept that we’re going to sometimes get a second MCE (or third or fourth) before we can get to it. Can we do the cleanup from an interrupt? IPI-to-self might be a credible approach, if so.
Powered by blists - more mailing lists