linux-kernel - RE: [PATCH v2] mm, hwpoison: Try to recover from copy-on write faults

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <SJ1PR11MB6083CEDBA2719825A1AD325EFC2D9@SJ1PR11MB6083.namprd11.prod.outlook.com>
Date:   Fri, 21 Oct 2022 16:30:50 +0000
From:   "Luck, Tony" <tony.luck@...el.com>
To:     Shuai Xue <xueshuai@...ux.alibaba.com>,
        David Laight <David.Laight@...LAB.COM>
CC:     Naoya Horiguchi <naoya.horiguchi@....com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Miaohe Lin <linmiaohe@...wei.com>,
        "Matthew Wilcox" <willy@...radead.org>,
        "Williams, Dan J" <dan.j.williams@...el.com>,
        Michael Ellerman <mpe@...erman.id.au>,
        Nicholas Piggin <npiggin@...il.com>,
        Christophe Leroy <christophe.leroy@...roup.eu>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linuxppc-dev@...ts.ozlabs.org" <linuxppc-dev@...ts.ozlabs.org>
Subject: RE: [PATCH v2] mm, hwpoison: Try to recover from copy-on write faults

>> But maybe it is some RMW instruction ... then, if all the above options didn't happen ... we
>> could get another machine check from the same address. But then we just follow the usual
>> recovery path.

> Let assume the instruction that cause the COW is in the 63/64 case, aka,
> it is writing a different cache line from the poisoned one. But the new_page
> allocated in COW is dropped right? So might page fault again?

It can, but this should be no surprise to a user that has a signal handler for
a h/w event (SIGBUS, SIGSEGV, SIGILL) that does nothing to address the
problem, but simply returns to re-execute the same instruction that caused
the original trap.

There may be badly written signal handlers that do this. But they just cause
pain for themselves. Linux can keep taking the traps and fixing things up and
sending a new signal over and over.

In this case that loop may involve taking the machine check again, so some
extra pain for the kernel, but recoverable machine checks on Intel/x86 switched
from broadcast to delivery to just the logical CPU that tried to consume the poison
a few generations back. So only a bit more painful than a repeated page fault.

-Tony