[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250227102255.6843705e@imammedo.users.ipa.redhat.com>
Date: Thu, 27 Feb 2025 10:22:55 +0100
From: Igor Mammedov <imammedo@...hat.com>
To: Mauro Carvalho Chehab <mchehab+huawei@...nel.org>
Cc: "Michael S . Tsirkin" <mst@...hat.com>, Jonathan Cameron
<Jonathan.Cameron@...wei.com>, Shiju Jose <shiju.jose@...wei.com>,
qemu-arm@...gnu.org, qemu-devel@...gnu.org, Ani Sinha
<anisinha@...hat.com>, Dongjiu Geng <gengdongjiu1@...il.com>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 03/14] acpi/ghes: Use HEST table offsets when
preparing GHES records
On Wed, 26 Feb 2025 17:14:06 +0100
Mauro Carvalho Chehab <mchehab+huawei@...nel.org> wrote:
> Em Tue, 25 Feb 2025 10:43:27 +0100
> Igor Mammedov <imammedo@...hat.com> escreveu:
>
> > On Fri, 21 Feb 2025 07:02:21 +0100
> > Mauro Carvalho Chehab <mchehab+huawei@...nel.org> wrote:
> >
> > > Em Mon, 3 Feb 2025 15:34:23 +0100
> > > Igor Mammedov <imammedo@...hat.com> escreveu:
> > >
> > > > On Fri, 31 Jan 2025 18:42:44 +0100
> > > > Mauro Carvalho Chehab <mchehab+huawei@...nel.org> wrote:
> > > >
> > > > > There are two pointers that are needed during error injection:
> > > > >
> > > > > 1. The start address of the CPER block to be stored;
> > > > > 2. The address of the ack.
> > > > >
> > > > > It is preferable to calculate them from the HEST table. This allows
> > > > > checking the source ID, the size of the table and the type of the
> > > > > HEST error block structures.
> > > > >
> > > > > Yet, keep the old code, as this is needed for migration purposes.
> > > > >
> > > > > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@...nel.org>
> > > > > ---
> > > > > hw/acpi/ghes.c | 132 ++++++++++++++++++++++++++++++++++++-----
> > > > > include/hw/acpi/ghes.h | 1 +
> > > > > 2 files changed, 119 insertions(+), 14 deletions(-)
> > > > >
> > > > > diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> > > > > index 27478f2d5674..8f284fd191a6 100644
> > > > > --- a/hw/acpi/ghes.c
> > > > > +++ b/hw/acpi/ghes.c
> > > > > @@ -41,6 +41,12 @@
> > > > > /* Address offset in Generic Address Structure(GAS) */
> > > > > #define GAS_ADDR_OFFSET 4
> > > > >
> > > > > +/*
> > > > > + * ACPI spec 1.0b
> > > > > + * 5.2.3 System Description Table Header
> > > > > + */
> > > > > +#define ACPI_DESC_HEADER_OFFSET 36
> > > > > +
> > > > > /*
> > > > > * The total size of Generic Error Data Entry
> > > > > * ACPI 6.1/6.2: 18.3.2.7.1 Generic Error Data,
> > > > > @@ -61,6 +67,25 @@
> > > > > */
> > > > > #define ACPI_GHES_GESB_SIZE 20
> > > > >
> > > > > +/*
> > > > > + * Offsets with regards to the start of the HEST table stored at
> > > > > + * ags->hest_addr_le,
> > > >
> > > > If I read this literary, then offsets above are not what
> > > > declared later in this patch.
> > > > I'd really drop this comment altogether as it's confusing,
> > > > and rather get variables/macro naming right
> > > >
> > > > > according with the memory layout map at
> > > > > + * docs/specs/acpi_hest_ghes.rst.
> > > > > + */
> > > >
> > > > what we need is update to above doc, describing new and old ways.
> > > > a separate patch.
> > >
> > > I can't see anything that should be changed at
> > > docs/specs/acpi_hest_ghes.rst, as this series doesn't change the
> > > firmware layout: we're still using two firmware tables:
> > >
> > > - etc/acpi/tables, with HEST on it;
> > > - etc/hardware_errors, with:
> > > - error block addresses;
> > > - read_ack registers;
> > > - CPER records.
> > >
> > > The only changes that this series introduce are related to how
> > > the error generation logic navigates between HEST and hw_errors
> > > firmware. This is not described at acpi_hest_ghes.rst, and both
> > > ways follow ACPI specs to the letter.
> > >
> > > The only difference is that the code which populates the CPER
> > > record and the error/read offsets doesn't require to know how
> > > the HEST table generation placed offsets, as it will basically
> > > reproduce what OSPM firmware does when handling HEST events.
> >
> > section 8 describes old way to get to address to record old CPER,
> > so it needs to amended to also describe a new approach and say
> > which way is used for which version.
> >
> > possibly section 11 might need some messaging as well.
>
> Ok, I'll modify it and place at the end of the series. Please
> see below if the new text is ok for you.
>
> ---
>
> [PATCH] docs/specs/acpi_hest_ghes.rst: update it to reflect some changes
s/^^^/docs: hest: add new "etc/acpi_table_hest_addr" and update workflow/
>
> While the HEST layout didn't change, there are some internal
> changes related to how offsets are calculated and how memory error
> events are triggered.
>
> Update specs to reflect such changes.
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@...nel.org>
>
> diff --git a/docs/specs/acpi_hest_ghes.rst b/docs/specs/acpi_hest_ghes.rst
> index c3e9f8d9a702..f22d2eefdec7 100644
> --- a/docs/specs/acpi_hest_ghes.rst
> +++ b/docs/specs/acpi_hest_ghes.rst
> @@ -89,12 +89,21 @@ Design Details
> addresses in the "error_block_address" fields with a pointer to the
> respective "Error Status Data Block" in the "etc/hardware_errors" blob.
>
> -(8) QEMU defines a third and write-only fw_cfg blob which is called
> - "etc/hardware_errors_addr". Through that blob, the firmware can send back
> - the guest-side allocation addresses to QEMU. The "etc/hardware_errors_addr"
> - blob contains a 8-byte entry. QEMU generates a single WRITE_POINTER command
> - for the firmware. The firmware will write back the start address of
> - "etc/hardware_errors" blob to the fw_cfg file "etc/hardware_errors_addr".
> +(8) QEMU defines a third and write-only fw_cfg blob to store the location
> + where the error block offsets, read ack registers and CPER records are
> + stored.
> +
> + Up to QEMU 9.2, the location was at "etc/hardware_errors_addr", and
> + contains an offset for the beginning of "etc/hardware_errors".
> +
> + Newer versions place the location at "etc/acpi_table_hest_addr",
> + pointing to the beginning of the HEST table.
> +
> + Through that such offsets, the firmware can send back the guest-side
^^^^^^^^^^^^^^^^^^^^^^^^^ can't parse that, suggest to just drop the phrase
> + allocation addresses to QEMU. They contain a 8-byte entry. QEMU generates
> + a single WRITE_POINTER command for the firmware. The firmware will write
> + back the start address of either "etc/hardware_errors" or HEST table at
^^^^ drop this?
> + the correspoinding address firmware.
>
> (9) When QEMU gets a SIGBUS from the kernel, QEMU writes CPER into corresponding
> "Error Status Data Block", guest memory, and then injects platform specific
> @@ -105,8 +114,6 @@ Design Details
> kernel, on receiving notification, guest APEI driver could read the CPER error
> and take appropriate action.
>
> -(11) kvm_arch_on_sigbus_vcpu() uses source_id as index in "etc/hardware_errors" to
> - find out "Error Status Data Block" entry corresponding to error source. So supported
> - source_id values should be assigned here and not be changed afterwards to make sure
> - that guest will write error into expected "Error Status Data Block" even if guest was
> - migrated to a newer QEMU.
> +(11) kvm_arch_on_sigbus_vcpu() report RAS errors via a SEA notifications,
> + when a SIGBUS event is triggered.
> The logic to convert a SEA notification
> + into a source ID is defined inside ghes.c source file.
that's cheating and not documentation by any means
>
>
>
Powered by blists - more mailing lists