[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <cbf11eb9-ce90-426a-a27f-623e6c350426@amd.com>
Date: Wed, 6 Mar 2024 17:21:51 -0600
From: "Naik, Avadhut" <avadnaik@....com>
To: "Luck, Tony" <tony.luck@...el.com>, Borislav Petkov <bp@...en8.de>
Cc: "Mehta, Sohil" <sohil.mehta@...el.com>, "x86@...nel.org"
<x86@...nel.org>, "linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"yazen.ghannam@....com" <yazen.ghannam@....com>,
Avadhut Naik <avadhut.naik@....com>
Subject: [PATCH] x86/mce: Dynamically size space for machine check records
On 3/6/2024 16:07, Luck, Tony wrote:
>>> + mce_numrecords = max(80, num_possible_cpus() * 4);
>>
>> Per Boris's below suggestion, shouldn't this be:
>> mce_numrecords = max(80, num_possible_cpus() * 16);
>>
>>>> min(4*PAGE_SIZE, num_possible_cpus() * PAGE_SIZE);
>>>
>>> max() ofc.
>>>
>>>> There's a sane minimum and one page pro logical CPU should be fine on
>>>> pretty much every configuration...
>>
>> 4 MCE records per CPU equates to 1024 bytes, considering the genpool intrinsic
>> behavior you explained in the other subthread.
>
> Picking a good number of records-per-core may be more art than science. Boris
> is right that a page per CPU shouldn't cause any significant issue to systems with
> many CPUs, because they should have copious amounts of memory to make a
> balanced configuration. But 16 records per CPU feels way too high to me. The
> theoretical limit in a single scan of machine check banks on Intel is 32 (since
> Intel never has more than 32 banks). But those banks cover diverse h/w devices
> and it seems improbable that all, or even most, of them would log errors at the
> same time, with all CPUs on all sockets doing the same.
>
> After I posted the version with num_possible_cpus() * 4 I began to wonder whether
> "2" would be enough.
>
Was thinking along the same lines that 16 MCE records per thread might be too high.
But since Boris made the suggestion, I thought there might be a use case that I am
unaware of. Perhaps, some issue that had been debugged in the past. Hence, my
earlier question if it should be 16 instead of 4.
I think 2 records should also be good. IIRC, the patch that I submitted reserved
space of 2 records per logical CPU in the genpool.
>> Apart from this, tested the patch on a couple of AMD systems. Didn't observe any
>> issues.
>
> Thanks very much for testing.
>
> -Tony
--
Thanks,
Avadhut Naik
Powered by blists - more mailing lists