linux-kernel - Re: [PATCH] ACPI/APEI: Clear GHES block

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20181221185937.GL1325@zn.tnic>
Date:   Fri, 21 Dec 2018 19:59:37 +0100
From:   Borislav Petkov <bp@...en8.de>
To:     James Morse <james.morse@....com>
Cc:     David Arcari <darcari@...hat.com>,
        "Rafael J. Wysocki" <rjw@...ysocki.net>,
        Linux ACPI <linux-acpi@...r.kernel.org>,
        Lenny Szubowicz <lszubowi@...hat.com>,
        Len Brown <lenb@...nel.org>, Tony Luck <tony.luck@...el.com>,
        "Eric W. Biederman" <ebiederm@...ssion.com>,
        Alexandru Gagniuc <mr.nuke.me@...il.com>,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] ACPI/APEI: Clear GHES block_status before panic()

On Fri, Dec 21, 2018 at 06:52:20PM +0000, James Morse wrote:
> Do we need to ghes_ack_error() too?

That's GHES v2 AFAICT.

> With the location cleared the new kernel will never find the records, and
> firmware can never re-use that location because it wasn't ack'd. The upshot is
> RAS records can't be generated for the kdump kernel. The acpi spec talks about
> use of the memory, so I don't think its fair for it to use this to disarm a
> watchdog.
> 
> I think we can live with this as the kdump kernel isn't going to handle RAS
> errors for the bulk of memory anyway.

Usually, handling hw errors is always better than not but the second
kernel can't do anything better in that respect than the first, right?
If it panics, it panics - no matter the kernel. Generally.

Therefore I think the role of the second kernel should be to be as
resilient as possible to hw errors - like, not even see them :-) - dump
the memory of the first kernel as quickly as possible and reboot for
analysis.

IMHO, of course.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.