linux-kernel - Re: [PATCH 3/4] mce/copyin: fix to not SIGBUS when copying from user hits poison

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20210414130509.GF10709@zn.tnic>
Date:   Wed, 14 Apr 2021 15:05:09 +0200
From:   Borislav Petkov <bp@...en8.de>
To:     "Luck, Tony" <tony.luck@...el.com>
Cc:     "x86@...nel.org" <x86@...nel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        Andy Lutomirski <luto@...nel.org>,
        Aili Yao <yaoaili@...gsoft.com>,
        HORIGUCHI NAOYA( 堀口　直也) 
        <naoya.horiguchi@....com>
Subject: Re: [PATCH 3/4] mce/copyin: fix to not SIGBUS when copying from user
 hits poison

On Tue, Apr 13, 2021 at 04:13:03PM +0000, Luck, Tony wrote:
> Even if no applications ever do anything with it, it is still useful to avoid
> crashing the whole system and just terminate one application/guest.

True.

> There's one more item on my long term TODO list. Add fixups so that
> copy_to_user() from poison in the page cache doesn't crash, but just
> checks to see if the page was clean .. .in which case re-read from the
> filesystem into a different physical page and retire the old page ... the
> read can now succeed. If the page is dirty, then fail the read (and retire
> the page ... need to make sure filesystem knows the data for the page
> was lost so subsequent reads return -EIO or something).

Makes sense.

> Page cache occupies enough memory that it is a big enough
> source of system crashes that could be avoided. I'm not sure
> if there are any other obvious cases after this ... it all gets into
> diminishing returns ... not really worth it to handle a case that
> only occupies 0.00002% of memory.

Ack.

> See above. With core counts continuing to increase, the cloud service
> providers really want to see fewer events that crash the whole physical
> machine (taking down dozens, or hundreds, of guest VMs).

Yap.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette