[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7b68f364-a324-4e2c-87be-19cdef4e3ad2@amd.com>
Date: Mon, 29 Apr 2024 10:36:57 -0400
From: Yazen Ghannam <yazen.ghannam@....com>
To: Borislav Petkov <bp@...en8.de>
Cc: yazen.ghannam@....com, linux-edac@...r.kernel.org,
linux-kernel@...r.kernel.org, tony.luck@...el.com, x86@...nel.org,
Avadhut.Naik@....com, John.Allen@....com
Subject: Re: [PATCH v2 09/16] x86/mce: Unify AMD THR handler with MCA Polling
On 4/29/2024 9:40 AM, Borislav Petkov wrote:
> On Thu, Apr 04, 2024 at 10:13:52AM -0500, Yazen Ghannam wrote:
>> @@ -787,6 +793,8 @@ bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
>> mce_log(&m);
>>
>> clear_it:
>> + vendor_handle_error(&m);
>
> Wait, whaaat?
>
> The normal polling happens periodically (each 5 mins) and you want to
> reset the thresholding blocks each 5 mins?
>
> And the code has there now:
>
> static void reset_block(struct threshold_block *block)
> {
>
> ...
>
> /* Reset threshold block after logging error. */
> memset(&tr, 0, sizeof(tr));
> tr.b = block;
> threshold_restart_bank(&tr);
> }
>
> but no error has been logged.
>
> Frankly, I don't see the point for this part: polling all banks on
> a thresholding interrupt makes sense. But this resetting from within the
> polling doesn't make any sense.
>
> Especially if that polling interval is user-controllable.
>
> Thx.
>
The reset only happens on a threshold overflow event. There's a check above.
if (!(high & MASK_OVERFLOW_HI))
return;
Basically, all the cases in vendor_handle_error() would be conditional.
Related to this, I've been thinking that banks with thresholding enabled
should be removed from the list of polling banks. This is done on Intel but
not on AMD.
I wanted to give it more thought, because I think folks have come to expect
polling and thresholding to be independent on AMD.
Thanks,
Yazen
Powered by blists - more mailing lists