linux-kernel - Re: [PATCH] ACPI: PHAT: Add Platform Health Assessment Table support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAJZ5v0jEcD_1+jHfAk9eN0YYJFbDZN2rZ97KHyH2-w6EqRN9+g@mail.gmail.com>
Date:   Mon, 21 Aug 2023 19:29:19 +0200
From:   "Rafael J. Wysocki" <rafael@...nel.org>
To:     "Limonciello, Mario" <mario.limonciello@....com>
Cc:     "Rafael J. Wysocki" <rafael@...nel.org>,
        Avadhut Naik <avadnaik@....com>,
        "Wilczynski, Michal" <michal.wilczynski@...el.com>,
        Avadhut Naik <avadhut.naik@....com>, lenb@...nel.org,
        linux-acpi@...r.kernel.org, yazen.ghannam@....com,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] ACPI: PHAT: Add Platform Health Assessment Table support

On Mon, Aug 21, 2023 at 7:17 PM Limonciello, Mario
<mario.limonciello@....com> wrote:
>
> On 8/21/2023 12:12 PM, Rafael J. Wysocki wrote:
> <snip>
> >> I was just talking to some colleagues about PHAT recently as well.
> >>
> >> The use case that jumps out is "system randomly rebooted while I was
> >> doing XYZ".  You don't know what happened, but you keep using your
> >> system.  Then it happens again.
> >>
> >> If the reason for the random reboot is captured to dmesg you can cross
> >> reference your journal from the next boot after any random reboot and
> >> get the reason for it.  If a user reports this to a Gitlab issue tracker
> >> or Bugzilla it can be helpful in establishing a pattern.
> >>
> >>>> The below location may be appropriate in that case:
> >>>> /sys/firmware/acpi/
> >>>
> >>> Yes, it may. >
> >>>> We already have FPDT and BGRT being exported from there.
> >>>
> >>> In fact, all of the ACPI tables can be retrieved verbatim from
> >>> /sys/firmware/acpi/tables/ already, so why exactly do you want the
> >>> kernel to parse PHAT in particular?
> >>>
> >>
> >> It's not to say that /sys/firmware/acpi/PHAT isn't useful, but having
> >> something internal to the kernel "automatically" parsing it and saving
> >> information to a place like the kernel log that is already captured by
> >> existing userspace tools I think is "more" useful.
> >
> > What existing user space tools do you mean?  Is there anything already
> > making use of the kernel's PHAT output?
> >
>
> I was meaning things like systemd already capture the kernel long
> ringbuffer.  If you save stuff like this into the kernel log, it's going
> to be indexed and easier to grep for boots that had it.
>
> > And why can't user space simply parse PHAT by itself?
> >  > There are multiple ACPI tables that could be dumped into the kernel
> > log, but they aren't.  Guess why.
>
> Right; there's not reason it can't be done by userspace directly.
>
> Another way to approach this problem could be to modify tools that
> excavate records from a reboot to also get PHAT.  For example
> systemd-pstore will get any kernel panics from the previous boot from
> the EFI pstore and put them into /var/lib/systemd/pstore.
>
> No reason that couldn't be done automatically for PHAT too.

I'm not sure about the connection between the PHAT dump in the kernel
log and pstore.

The PHAT dump would be from the time before the failure, so it is
unclear to me how useful it can be for diagnosing it.  However, after
a reboot one should be able to retrieve PHAT data from the table
directly and that may include some information regarding the failure.

With pstore, the assumption is that there will be some information
relevant for diagnosing the failure in the kernel buffer, but I'm not
sure how the PHAT dump from before the failure can help here?