lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8f580a2544d846c69c9941e151fa7cc3@intel.com>
Date:   Tue, 28 Jun 2022 16:51:49 +0000
From:   "Luck, Tony" <tony.luck@...el.com>
To:     Borislav Petkov <bp@...en8.de>
CC:     "x86@...nel.org" <x86@...nel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "patches@...ts.linux.dev" <patches@...ts.linux.dev>,
        Yazen Ghannam <yazen.ghannam@....com>
Subject: RE: [PATCH] RAS/CEC: Reduce default threshold to offline a page to
 "2"

>> Existing default is 1023 ... which is not a good choice for anyone (except
>> perhaps ostriches that want to bury their heads in the sand an ignore marginal
>> DIMMs for as long as possible).
>
>Why isn't that a good choice?

It fails to use the capabilities of h/w an Linux to avoid a fatal error in the future.
Corrected errors are (sometimes) a predictor of marginal/aging memory. Copying
data out of a failing page while there are just corrected errors can avoid losing
that whole page later.

A single error is plausibly a particle strike causing a bit flip. But a second error
in the same page is a long shot (my desktop has 64G of memory, so 16 million
pages ... that's an awful lot of other targets for a second particle strike).

>I'm sure there are error rates where this fits just fine.

Explain further. Apart from the "ostrich" case I'm not sure what they are.

>> So changing the threshold to "2" would be an improvement in at least
>> being right for one vendor, instead of wrong for all.
>
>So I'm pretty sure that is not needed on AMD at all.

It's far more a property of DIMMs than of the CPU. Unless AMD are using
some DECTED or better level of ECC for memory.

-Tony

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ