linux-kernel - Re: [PATCH] ACPI/APEI: Clear GHES block

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <0bb80989-4fe5-c320-8ffc-0f39502110c9@arm.com>
Date:   Fri, 21 Dec 2018 18:52:20 +0000
From:   James Morse <james.morse@....com>
To:     Borislav Petkov <bp@...en8.de>, David Arcari <darcari@...hat.com>
Cc:     "Rafael J. Wysocki" <rjw@...ysocki.net>,
        Linux ACPI <linux-acpi@...r.kernel.org>,
        Lenny Szubowicz <lszubowi@...hat.com>,
        Len Brown <lenb@...nel.org>, Tony Luck <tony.luck@...el.com>,
        "Eric W. Biederman" <ebiederm@...ssion.com>,
        Alexandru Gagniuc <mr.nuke.me@...il.com>,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] ACPI/APEI: Clear GHES block_status before panic()

On 21/12/2018 11:17, Rafael J. Wysocki wrote:
> On Thursday, December 20, 2018 8:24:47 PM CET Borislav Petkov wrote:
>> + James.

Thanks,

>> On Wed, Dec 19, 2018 at 11:50:52AM -0500, David Arcari wrote:
>>> From: Lenny Szubowicz <lszubowi@...hat.com>
>>>
>>> In __ghes_panic() clear the block status in the APEI generic
>>> error status block for that generic hardware error source before
>>> calling panic() to prevent a second panic() in the crash kernel
>>> for exactly the same fatal error.
>>>
>>> Otherwise ghes_probe(), running in the crash kernel, would see
>>> an unhandled error in the APEI generic error status block and
>>> panic again, thereby precluding any crash dump.

I bet that was fun to watch!


>>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>>> index 02c6fd9..f008ba7 100644
>>> --- a/drivers/acpi/apei/ghes.c
>>> +++ b/drivers/acpi/apei/ghes.c
>>> @@ -691,6 +691,8 @@ static void __ghes_panic(struct ghes *ghes)
>>>  {
>>>  	__ghes_print_estatus(KERN_EMERG, ghes->generic, ghes->estatus);
>>>  
>>> +	ghes_clear_estatus(ghes);
>>> +
>>>  	/* reboot to log the error! */
>>>  	if (!panic_timeout)
>>>  		panic_timeout = ghes_panic_timeout;
>>
>> Acked-by: Borislav Petkov <bp@...e.de>
> 
> Patch applied, thanks!

Great!

Do we need to ghes_ack_error() too?

With the location cleared the new kernel will never find the records, and
firmware can never re-use that location because it wasn't ack'd. The upshot is
RAS records can't be generated for the kdump kernel. The acpi spec talks about
use of the memory, so I don't think its fair for it to use this to disarm a
watchdog.

I think we can live with this as the kdump kernel isn't going to handle RAS
errors for the bulk of memory anyway.


Thanks,

James