[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <82fe23f6-efd8-d256-7f34-a0bbc91237d3@codeaurora.org>
Date: Thu, 3 Aug 2017 16:06:22 -0600
From: "Baicar, Tyler" <tbaicar@...eaurora.org>
To: "Luck, Tony" <tony.luck@...el.com>
Cc: Borislav Petkov <bp@...e.de>, rjw@...ysocki.net, lenb@...nel.org,
will.deacon@....com, james.morse@....com, shiju.jose@...wei.com,
geliangtang@...il.com, andriy.shevchenko@...ux.intel.com,
linux-acpi@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] acpi: apei: clear error status before acknowledging the
error
On 7/31/2017 11:44 AM, Baicar, Tyler wrote:
> On 7/31/2017 11:00 AM, Luck, Tony wrote:
>> On Mon, Jul 31, 2017 at 10:15:27AM -0600, Baicar, Tyler wrote:
>>> I think the better thing to do in this case is still send the ack. If
>>> ghes_read_estatus() fails, then
>>> either we are unable to read the estatus or the estatus is
>>> empty/invalid.
>> Right now we silently handle that failure of ghes_read_estatus(). That
>> might be hiding some Linux bugs if we are calling ghes_proc() in cases
>> where we shouldn't.
>>
>> Perhaps we should have something like this, so if systems do start
>> acting
>> weirdly there will be a note that we took this path:
>>
>> rc = ghes_read_estatus(ghes, 0);
>> if (rc) {
>> pr_notice("surprise failure reading ghes estatus\n");
>> goto out;
>> }
> Thank you Tony for the feedback, I can add a print like this in the
> next version. I'll verify that
> rc is not -ENOENT though so we don't print it on empty scenarios since
> the polled source
> will be hitting this path frequently.
>
Hi Tony,
I think I'm going to avoid adding this print, the failures are reported
in prints in ghes_read_estatus(), so it looks a little redundant:
[ 133.601165] [Firmware Warn]: GHES: Failed to read error status block!
[ 133.601167] surprise failure reading GHES estatus
Thanks,
Tyler
>>
>>> If we do not send the ack, then we will be in a scenario where FW
>>> will not
>>> send any more errors.
>> We might ACK something that the firmware didn't send, which may
>> lead to other problems.
>>
>>> I think it would be better to still have the FW send the errors and
>>> kernel
>>> complain about issues with
>> But I agree with this. We should send the ACK. Luckliy this doesn't
>> have
>> a long legacy problem because the whole ACK mechanism is a new thing. So
>> we only have to worry about GHESv2 supporting BIOS.
>>
>> -Tony
>
--
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.
Powered by blists - more mailing lists