linux-kernel - Re: [PATCH] x86/mce: Increase the size of the MCE pool from 2 to 8 pages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <afaef377-25e0-49f6-a99f-3e5bd4b44f87@intel.com>
Date:   Wed, 11 Oct 2023 10:32:44 -0700
From:   Dave Hansen <dave.hansen@...el.com>
To:     Filippo Sironi <sironi@...zon.de>, linux-kernel@...r.kernel.org
Cc:     tony.luck@...el.com, bp@...en8.de, tglx@...utronix.de,
        mingo@...hat.com, dave.hansen@...ux.intel.com, x86@...nel.org,
        hpa@...or.com, linux-edac@...r.kernel.org
Subject: Re: [PATCH] x86/mce: Increase the size of the MCE pool from 2 to 8
 pages

On 10/11/23 09:33, Filippo Sironi wrote:
> On some of our large servers and some of our most sorry servers ( 🙂 ),
> we're seeing the kernel reporting the warning in mce_gen_pool_add: "MCE
> records pool full!". Let's increase the amount of memory that we use to
> store the MCE records from 2 to 8 pages to prevent this from happening
> and be able to collect useful information.

MCE_POOLSZ is used to size gen_pool_buf[] which was a line out of your
diff context:

> #define MCE_POOLSZ      (2 * PAGE_SIZE)
> 
> static struct gen_pool *mce_evt_pool;
> static LLIST_HEAD(mce_event_llist);
> static char gen_pool_buf[MCE_POOLSZ];

That's in .bss which means it eats up memory for *everyone*.  It seems a
little silly to eat up an extra 6 pages of memory for *everyone* in
order to get rid of a message on what I assume is a relatively small set
of "sorry servers".

Is there any way that the size of the pool can be more automatically
determined?  Is the likelihood of a bunch errors proportional to the
number of CPUs or amount of RAM or some other aspect of the hardware?

Could the pool be emptied more aggressively so that it does not fill up?

Last, what is the _actual_ harm caused by missing this "useful
information"?  Is collecting that information collectively really worth
24kb*NR_X86_SYSTEMS_ON_EARTH?  Is it really that valuable to know that
the system got 4,000 ECC errors on a DIMM versus 1,000?

If there's no other choice and this extra information is *CRITICAL*,
then by all means let's enlarge the buffer.  But, let's please do it for
a known, tangible benefit.