lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f68b684e-0b06-4175-93ca-3df869b5e164@amd.com>
Date: Thu, 15 Feb 2024 14:14:53 -0600
From: "Naik, Avadhut" <avadnaik@....com>
To: Borislav Petkov <bp@...en8.de>
Cc: Sohil Mehta <sohil.mehta@...el.com>, x86@...nel.org,
 linux-edac@...r.kernel.org, tony.luck@...el.com,
 linux-kernel@...r.kernel.org, yazen.ghannam@....com,
 Avadhut Naik <avadhut.naik@....com>
Subject: [PATCH 2/2] x86/MCE: Add command line option to extend MCE Records
 pool

Hi,

On 2/12/2024 02:58, Borislav Petkov wrote:
> On Sun, Feb 11, 2024 at 08:54:29PM -0600, Naik, Avadhut wrote:
>> Okay. Will make changes to allocate memory and set size of the pool
>> when it is created. Also, will remove the command line parameter and
>> resubmit.
> 
> Before you do, go read that original thread again but this time take
> your time to grok it.
> 
> And then try answering those questions:
> 
> * Why are *you* fixing this? I know what the AWS reason is, what is
> yours?
> 
I think this issue of genpool getting full with MCE records can occur
on AMD system too since the pool doesn't scale up with the number of
CPUs and memory in the system. The probability of issue occurrence
only increases as CPU count and memory increases. Feel that the genpool
size should be proportional to, at least, the CPU count of the system.

> * Can you think of a slick deduplication scheme instead of blindly
> raising the buffer size?
> 
> * What's wrong with not logging some early errors, can we live with that
> too? If it were firmware-first, it cannot simply extend its buffer size
> because it has limited space. So what does firmware do in such cases?
>
Think that we can live with not logging some early errors, as long as they
are correctable.
Not very sure about what you mean by Firmware First. Do you mean handling
of MCEs through HEST and GHES? Or something else?

> Think long and hard about the big picture, analyze the problem properly
> and from all angles before you go and do patches.
> 
> Thx.
> 

-- 
Thanks,
Avadhut Naik

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ