[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200331090929.GB29131@zn.tnic>
Date: Tue, 31 Mar 2020 11:09:29 +0200
From: Borislav Petkov <bp@...en8.de>
To: Shiju Jose <shiju.jose@...wei.com>
Cc: "linux-acpi@...r.kernel.org" <linux-acpi@...r.kernel.org>,
"linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"rjw@...ysocki.net" <rjw@...ysocki.net>,
"helgaas@...nel.org" <helgaas@...nel.org>,
"lenb@...nel.org" <lenb@...nel.org>,
"james.morse@....com" <james.morse@....com>,
"tony.luck@...el.com" <tony.luck@...el.com>,
"gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
"zhangliguang@...ux.alibaba.com" <zhangliguang@...ux.alibaba.com>,
"tglx@...utronix.de" <tglx@...utronix.de>,
Linuxarm <linuxarm@...wei.com>,
Jonathan Cameron <jonathan.cameron@...wei.com>,
tanxiaofei <tanxiaofei@...wei.com>,
yangyicong <yangyicong@...wei.com>
Subject: Re: [PATCH v6 1/2] ACPI / APEI: Add support to notify the vendor
specific HW errors
On Mon, Mar 30, 2020 at 03:44:29PM +0000, Shiju Jose wrote:
> 1. rasdaemon need not to print the vendor error data reported by the firmware if the
> kernel driver already print those information. In this case rasdaemon will only need to store
> the decoded vendor error data to the SQL database.
Well, there's a problem with this:
rasdaemon printing != kernel driver printing
Because printing in dmesg would need people to go grep dmesg.
Printing through rasdaemon or any userspace agent, OTOH, is a lot more
flexible wrt analyzing and collecting those error records. Especially
if you are a data center admin and you want to collect all your error
records: grepping dmesg simply doesn't scale versus all the rasdaemon
agents reporting to a centrallized location.
> 2. If the vendor kernel driver want to report extra error information through
> the vendor specific data (though presently we do not have any such use case) for the rasdamon to log.
> I think the error handled status useful to indicate that the kernel driver has filled the extra information and
> rasdaemon to decode and log them after extra data specific validity check.
The kernel driver can report that extra information without the kernel
saying that the error was handled.
So I still see no sense for the kernel to tell userspace explicitly that
it handled the error. There might be a valid reason, though, of which I
cannot think of right now.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
Powered by blists - more mailing lists