[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c73bb18b-02ef-6c35-f4cf-1738c17a96e5@arm.com>
Date: Wed, 8 Apr 2020 11:03:13 +0100
From: James Morse <james.morse@....com>
To: Borislav Petkov <bp@...en8.de>, Shiju Jose <shiju.jose@...wei.com>
Cc: linux-acpi@...r.kernel.org, linux-pci@...r.kernel.org,
linux-kernel@...r.kernel.org, rjw@...ysocki.net,
helgaas@...nel.org, lenb@...nel.org, tony.luck@...el.com,
gregkh@...uxfoundation.org, zhangliguang@...ux.alibaba.com,
tglx@...utronix.de, linuxarm@...wei.com,
jonathan.cameron@...wei.com, tanxiaofei@...wei.com,
yangyicong@...ilicon.com
Subject: Re: [PATCH v6 1/2] ACPI / APEI: Add support to notify the vendor
specific HW errors
Hi Boris, Shiju,
Sorry for not spotting this reply earlier: Its in-reply to v1, so gets buried.
On 27/03/2020 18:22, Borislav Petkov wrote:
> On Wed, Mar 25, 2020 at 04:42:22PM +0000, Shiju Jose wrote:
>> Presently APEI does not support reporting the vendor specific
>> HW errors, received in the vendor defined table entries, to the
>> vendor drivers for any recovery.
>>
>> This patch adds the support to register and unregister the
>
> Avoid having "This patch" or "This commit" in the commit message. It is
> tautologically useless.
>
> Also, do
>
> $ git grep 'This patch' Documentation/process
>
> for more details.
>
>> error handling function for the vendor specific HW errors and
>> notify the registered kernel driver.
>> @@ -526,10 +552,17 @@ static void ghes_do_proc(struct ghes *ghes,
>> log_arm_hw_error(err);
>> } else {
>> void *err = acpi_hest_get_payload(gdata);
>> + u8 error_handled = false;
>> + int ret;
>> +
>> + ret = atomic_notifier_call_chain(&ghes_event_notify_list, 0, gdata);
>
> Well, this is a notifier with standard name for a non-standard event.
> Not optimal.
>
> Why does only this event need a notifier? Because your driver is
> interested in only those events?
Its the 'else' catch-all for stuff drivers/acpi/apei doesn't know to handle.
In this case its because its a vendor specific GUID that only the vendor driver knows how
to parse.
>> + if (ret & NOTIFY_OK)
>> + error_handled = true;
>>
>> log_non_standard_event(sec_type, fru_id, fru_text,
>> sec_sev, err,
>> - gdata->error_data_length);
>> + gdata->error_data_length,
>> + error_handled);
>
> What's that error_handled thing for? That's just silly.
>
> Your notifier returns NOTIFY_STOP when it has queued the error. If you
> don't want to log it, just test == NOTIFY_STOP and do not log it then.
My thinking for this being needed was so user-space consumers of those tracepoints keep
working. Otherwise you upgrade, get this feature, and your user-space counters stop working.
You'd need to know this error source was now managed by an in-kernel driver, which may
report the errors somewhere else...
> Then your notifier callback is queuing the error into a kfifo for
> whatever reason and then scheduling a workqueue to handle it in user
> context...
>
> So I'm thinking that it would be better if you:
>
> * make that kfifo generic and part of ghes.c and queue all types of
> error records into it in ghes_do_proc() - not just the non-standard
> ones.
Move the drop to process context into ghes.c? This should result in less code.
I asked for this hooking to only be for the 'catch all' don't-know case so that we don't
get drivers trying to hook and handle memory errors. (if we ever wanted that, it should be
from part of memory_failure() so it catches all the ways of reporting memory-failure)
32bit arm has prior in this area.
> * then, when you're done queuing, you kick a workqueue.
>
> * that workqueue runs a normal, blocking notifier to which drivers
> register.
>
> Your driver can register to that notifier too and do the normal handling
> then and not have this ad-hoc, semi-generic, semi-vendor-specific thing.
As long as we don't walk a list of things that might handle a memory-error, and have some
random driver try and NOTIFY_STOP it....
aer_recover_queue() would be replaced by this. memory_failure_queue() has one additional
caller in drivers/ras/cec.c.
Thanks,
James
Powered by blists - more mailing lists