[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aa7a6b97-fed2-41cb-9ea4-0f42acaaeaba@amd.com>
Date: Mon, 21 Aug 2023 21:53:58 -0500
From: "Limonciello, Mario" <mario.limonciello@....com>
To: Yazen Ghannam <yazen.ghannam@....com>
Cc: Avadhut Naik <avadnaik@....com>,
"Wilczynski, Michal" <michal.wilczynski@...el.com>,
Avadhut Naik <avadhut.naik@....com>, lenb@...nel.org,
linux-acpi@...r.kernel.org, linux-kernel@...r.kernel.org,
"Rafael J. Wysocki" <rafael@...nel.org>
Subject: Re: [PATCH] ACPI: PHAT: Add Platform Health Assessment Table support
On 8/21/2023 6:33 PM, Yazen Ghannam wrote:
> On 8/21/23 3:23 PM, Limonciello, Mario wrote:
>>
>>
>> On 8/21/2023 2:16 PM, Rafael J. Wysocki wrote:
>> <snip>
>>>>>> Is there a preferred set of tools that can be updated?
>>>>>
>>>>> I think you need to talk to distro people about this.
>>>>>
>>>>>> If not, would it make sense to develop a set of common kernel tools for
>>>>>> this?
>>>>>
>>>>> Yes, it would, but please see above in the first place.
>>>>>
>>>>>> In my experience, it seems many folks use tools from their vendors or
>>>>>> custom tools.
>>>>>
>>>>> This observation matches my own experience.
>>>>
>>>> For the sake of discussion, and from a kernel developer's point of view,
>>>> should the tools be part of a separate project? Or should the tools be
>>>> part of the kernel tree like perf, etc.? Assuming that this needs to
>>>> start from scratch and not extending an existing project.
>>>
>>> It can be both in principle, but from the practical standpoint it is
>>> more likely to get all of the people to use the same set of tools if
>>> they are included into the kernel source tree.
>>
>> Yazen,
>>
>> You generally envision tools like this to only be used when there is a problem, and not something that's run critical path on every boot right?
>>
>
> Hi Mario,
>
> Generally, I think yes. But you summarized one issue earlier, and that
> is the case where a user doesn't explicitly fetch the information and it
> gets lost. This can be especially painful if the issue is difficult to
> reproduce or has a long time to failure. Of course, this is new and
> supplemental info, but every clue helps during debug.
>
> Some highlights from the ACPI spec...
>
> The PHAT is not urgent nor actionable by the OS:
> "It is not expected that the OSPM would act on the data being exposed."
>
> The info may be useful on each boot regardless of any problems:
> "The Reset Reason Health Record defines a mechanism to describe the
> cause of the last system reset or boot. The record will be created as a
> Health Record in the PHAT table. This provides a standard way for system
> firmware to inform the operating system of the cause of the last reset.
> This includes both expected and unexpected events to support insights
> across a fleet of systems by way of collecting the reset reason records
> on each boot."
>
> Note that it says "last reset", so it doesn't seem intended to keep a
> running log to be fetched later.
>
>> If so, how about doing it in a high level language with easily importable libraries like Python?
>>
>
> This sounds good to me. Anything that can handle binary files.
>
>> Then the tools can still be stored "in kernel tree" and distributed with distro "kernel tools" packages but you can more easily use them on random old kernels too since the binary via /sys/firmware/acpi/tables should be widely available.
>
> Yes, I agree. And I think we should give examples for running the tools
> as services at boot. And documentation is needed, of course.
>
> I don't exactly follow your last statement. Do you mean that new ACPI
> tables will be exposed in sysfs even without explicit kernel updates?
Yeah that's what I was meaning. For example look at other tables the
kernel doesn't parse like SLIC or MSDM. These don't have any changes to
show up there.
Powered by blists - more mailing lists