linux-kernel - RE: [RFD PATCH] x86/mce: Make sure to send SIGBUS even after losing the race to poison a page

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <38de4f009d3248f7bc7c99f29d34ac8a@intel.com>
Date:   Thu, 3 Sep 2020 17:09:43 +0000
From:   "Luck, Tony" <tony.luck@...el.com>
To:     Borislav Petkov <bp@...en8.de>
CC:     Naoya Horiguchi <naoya.horiguchi@....com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        "Song, Youquan" <youquan.song@...el.com>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [RFD PATCH] x86/mce: Make sure to send SIGBUS even after losing
 the race to poison a page

> Let's see if that logic makes sense: if #MC offlines the page and sends
> SIGBUS but CMCI only offlines the page, isn't it only logical for the
> CMCI to *also* send the SIGBUS too, after having offlined the page?
>
> I.e., both should do the proper and full recovery action. Just sayin...

It made sense, and seemed to explain an issue I was seeing, when I wrote it.
But some stress testing of that patch showed that it introduces some problems
and instability.

Without the patch I can inject 10,000 errors and have every one of them complete
correctly (process gets a SIGBUS with the address of the error). With my patch
around 0.4% of injections fail to provide the address to the SIGBUS handler, worse
the test gets a fatal error every 600-700 injections.

So, I'm abandoning that patch.

-Tony