[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0bb80989-4fe5-c320-8ffc-0f39502110c9@arm.com>
Date: Fri, 21 Dec 2018 18:52:20 +0000
From: James Morse <james.morse@....com>
To: Borislav Petkov <bp@...en8.de>, David Arcari <darcari@...hat.com>
Cc: "Rafael J. Wysocki" <rjw@...ysocki.net>,
Linux ACPI <linux-acpi@...r.kernel.org>,
Lenny Szubowicz <lszubowi@...hat.com>,
Len Brown <lenb@...nel.org>, Tony Luck <tony.luck@...el.com>,
"Eric W. Biederman" <ebiederm@...ssion.com>,
Alexandru Gagniuc <mr.nuke.me@...il.com>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] ACPI/APEI: Clear GHES block_status before panic()
On 21/12/2018 11:17, Rafael J. Wysocki wrote:
> On Thursday, December 20, 2018 8:24:47 PM CET Borislav Petkov wrote:
>> + James.
Thanks,
>> On Wed, Dec 19, 2018 at 11:50:52AM -0500, David Arcari wrote:
>>> From: Lenny Szubowicz <lszubowi@...hat.com>
>>>
>>> In __ghes_panic() clear the block status in the APEI generic
>>> error status block for that generic hardware error source before
>>> calling panic() to prevent a second panic() in the crash kernel
>>> for exactly the same fatal error.
>>>
>>> Otherwise ghes_probe(), running in the crash kernel, would see
>>> an unhandled error in the APEI generic error status block and
>>> panic again, thereby precluding any crash dump.
I bet that was fun to watch!
>>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>>> index 02c6fd9..f008ba7 100644
>>> --- a/drivers/acpi/apei/ghes.c
>>> +++ b/drivers/acpi/apei/ghes.c
>>> @@ -691,6 +691,8 @@ static void __ghes_panic(struct ghes *ghes)
>>> {
>>> __ghes_print_estatus(KERN_EMERG, ghes->generic, ghes->estatus);
>>>
>>> + ghes_clear_estatus(ghes);
>>> +
>>> /* reboot to log the error! */
>>> if (!panic_timeout)
>>> panic_timeout = ghes_panic_timeout;
>>
>> Acked-by: Borislav Petkov <bp@...e.de>
>
> Patch applied, thanks!
Great!
Do we need to ghes_ack_error() too?
With the location cleared the new kernel will never find the records, and
firmware can never re-use that location because it wasn't ack'd. The upshot is
RAS records can't be generated for the kdump kernel. The acpi spec talks about
use of the memory, so I don't think its fair for it to use this to disarm a
watchdog.
I think we can live with this as the kdump kernel isn't going to handle RAS
errors for the bulk of memory anyway.
Thanks,
James
Powered by blists - more mailing lists