[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <64e319e644b548a38c9549d668cfcc9c@intel.com>
Date: Mon, 1 Mar 2021 18:12:26 +0000
From: "Luck, Tony" <tony.luck@...el.com>
To: Aili Yao <yaoaili@...gsoft.com>, Andy Lutomirski <luto@...nel.org>
CC: HORIGUCHI NAOYA( 堀口 直也)
<naoya.horiguchi@....com>,
Dave Hansen <dave.hansen@...ux.intel.com>,
"Peter Zijlstra" <peterz@...radead.org>,
Thomas Gleixner <tglx@...utronix.de>,
"Ingo Molnar" <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
"H. Peter Anvin" <hpa@...or.com>, X86 ML <x86@...nel.org>,
"yangfeng1@...gsoft.com" <yangfeng1@...gsoft.com>,
Linux-MM <linux-mm@...ck.org>,
LKML <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH v3] x86/fault: Send a SIGBUS to user process always for
hwpoison page access.
> Programs that get a signal might expect that the RIP that the signal
> frame points to is the instruction that caused the signal and that the
> instruction faulted without side effects. For SIGSEGV, I would be
> especially nervous about this. Maybe SIGBUS is safer. For SIGSEGV,
> it's entirely valid to look at CR2 / si_fault_addr, fix it up, and
> return. This would be completely *invalid* with your patch. I'm not
> sure what to do about this.
The original plan was that s/w like databases would be able to write
their own application specific recovery code. E.g. they hit poison while
reading some "table". The process gets a SIGBUS with siginfo telling
the handler the virtual address range that has been lost. The code
uses mmap(MAP_FIXED) to map a new page into the lost address and
fills it with suitable data (either reconstructing lost data by replaying
transactions, or filling the table with some "data unknown" indicator).
Then the SIGBUS handler returns to re-execute the instruction that
failed.
As far as I know nobody has been that creative in production s/w.
But I think there are folks with a siglongjmp() to a "this whole transaction
just failed" safe point.
-Tony
Powered by blists - more mailing lists