lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 21 Oct 2022 16:30:50 +0000
From:   "Luck, Tony" <tony.luck@...el.com>
To:     Shuai Xue <xueshuai@...ux.alibaba.com>,
        David Laight <David.Laight@...LAB.COM>
CC:     Naoya Horiguchi <naoya.horiguchi@....com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Miaohe Lin <linmiaohe@...wei.com>,
        "Matthew Wilcox" <willy@...radead.org>,
        "Williams, Dan J" <dan.j.williams@...el.com>,
        Michael Ellerman <mpe@...erman.id.au>,
        Nicholas Piggin <npiggin@...il.com>,
        Christophe Leroy <christophe.leroy@...roup.eu>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linuxppc-dev@...ts.ozlabs.org" <linuxppc-dev@...ts.ozlabs.org>
Subject: RE: [PATCH v2] mm, hwpoison: Try to recover from copy-on write faults

>> But maybe it is some RMW instruction ... then, if all the above options didn't happen ... we
>> could get another machine check from the same address. But then we just follow the usual
>> recovery path.


> Let assume the instruction that cause the COW is in the 63/64 case, aka,
> it is writing a different cache line from the poisoned one. But the new_page
> allocated in COW is dropped right? So might page fault again?

It can, but this should be no surprise to a user that has a signal handler for
a h/w event (SIGBUS, SIGSEGV, SIGILL) that does nothing to address the
problem, but simply returns to re-execute the same instruction that caused
the original trap.

There may be badly written signal handlers that do this. But they just cause
pain for themselves. Linux can keep taking the traps and fixing things up and
sending a new signal over and over.

In this case that loop may involve taking the machine check again, so some
extra pain for the kernel, but recoverable machine checks on Intel/x86 switched
from broadcast to delivery to just the logical CPU that tried to consume the poison
a few generations back. So only a bit more painful than a repeated page fault.

-Tony


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ