[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250226171406.19c2de6b@sal.lan>
Date: Wed, 26 Feb 2025 17:14:06 +0100
From: Mauro Carvalho Chehab <mchehab+huawei@...nel.org>
To: Igor Mammedov <imammedo@...hat.com>
Cc: "Michael S . Tsirkin" <mst@...hat.com>, Jonathan Cameron
<Jonathan.Cameron@...wei.com>, Shiju Jose <shiju.jose@...wei.com>,
qemu-arm@...gnu.org, qemu-devel@...gnu.org, Ani Sinha
<anisinha@...hat.com>, Dongjiu Geng <gengdongjiu1@...il.com>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 03/14] acpi/ghes: Use HEST table offsets when
preparing GHES records
Em Tue, 25 Feb 2025 10:43:27 +0100
Igor Mammedov <imammedo@...hat.com> escreveu:
> On Fri, 21 Feb 2025 07:02:21 +0100
> Mauro Carvalho Chehab <mchehab+huawei@...nel.org> wrote:
>
> > Em Mon, 3 Feb 2025 15:34:23 +0100
> > Igor Mammedov <imammedo@...hat.com> escreveu:
> >
> > > On Fri, 31 Jan 2025 18:42:44 +0100
> > > Mauro Carvalho Chehab <mchehab+huawei@...nel.org> wrote:
> > >
> > > > There are two pointers that are needed during error injection:
> > > >
> > > > 1. The start address of the CPER block to be stored;
> > > > 2. The address of the ack.
> > > >
> > > > It is preferable to calculate them from the HEST table. This allows
> > > > checking the source ID, the size of the table and the type of the
> > > > HEST error block structures.
> > > >
> > > > Yet, keep the old code, as this is needed for migration purposes.
> > > >
> > > > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@...nel.org>
> > > > ---
> > > > hw/acpi/ghes.c | 132 ++++++++++++++++++++++++++++++++++++-----
> > > > include/hw/acpi/ghes.h | 1 +
> > > > 2 files changed, 119 insertions(+), 14 deletions(-)
> > > >
> > > > diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> > > > index 27478f2d5674..8f284fd191a6 100644
> > > > --- a/hw/acpi/ghes.c
> > > > +++ b/hw/acpi/ghes.c
> > > > @@ -41,6 +41,12 @@
> > > > /* Address offset in Generic Address Structure(GAS) */
> > > > #define GAS_ADDR_OFFSET 4
> > > >
> > > > +/*
> > > > + * ACPI spec 1.0b
> > > > + * 5.2.3 System Description Table Header
> > > > + */
> > > > +#define ACPI_DESC_HEADER_OFFSET 36
> > > > +
> > > > /*
> > > > * The total size of Generic Error Data Entry
> > > > * ACPI 6.1/6.2: 18.3.2.7.1 Generic Error Data,
> > > > @@ -61,6 +67,25 @@
> > > > */
> > > > #define ACPI_GHES_GESB_SIZE 20
> > > >
> > > > +/*
> > > > + * Offsets with regards to the start of the HEST table stored at
> > > > + * ags->hest_addr_le,
> > >
> > > If I read this literary, then offsets above are not what
> > > declared later in this patch.
> > > I'd really drop this comment altogether as it's confusing,
> > > and rather get variables/macro naming right
> > >
> > > > according with the memory layout map at
> > > > + * docs/specs/acpi_hest_ghes.rst.
> > > > + */
> > >
> > > what we need is update to above doc, describing new and old ways.
> > > a separate patch.
> >
> > I can't see anything that should be changed at
> > docs/specs/acpi_hest_ghes.rst, as this series doesn't change the
> > firmware layout: we're still using two firmware tables:
> >
> > - etc/acpi/tables, with HEST on it;
> > - etc/hardware_errors, with:
> > - error block addresses;
> > - read_ack registers;
> > - CPER records.
> >
> > The only changes that this series introduce are related to how
> > the error generation logic navigates between HEST and hw_errors
> > firmware. This is not described at acpi_hest_ghes.rst, and both
> > ways follow ACPI specs to the letter.
> >
> > The only difference is that the code which populates the CPER
> > record and the error/read offsets doesn't require to know how
> > the HEST table generation placed offsets, as it will basically
> > reproduce what OSPM firmware does when handling HEST events.
>
> section 8 describes old way to get to address to record old CPER,
> so it needs to amended to also describe a new approach and say
> which way is used for which version.
>
> possibly section 11 might need some messaging as well.
Ok, I'll modify it and place at the end of the series. Please
see below if the new text is ok for you.
---
[PATCH] docs/specs/acpi_hest_ghes.rst: update it to reflect some changes
While the HEST layout didn't change, there are some internal
changes related to how offsets are calculated and how memory error
events are triggered.
Update specs to reflect such changes.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@...nel.org>
diff --git a/docs/specs/acpi_hest_ghes.rst b/docs/specs/acpi_hest_ghes.rst
index c3e9f8d9a702..f22d2eefdec7 100644
--- a/docs/specs/acpi_hest_ghes.rst
+++ b/docs/specs/acpi_hest_ghes.rst
@@ -89,12 +89,21 @@ Design Details
addresses in the "error_block_address" fields with a pointer to the
respective "Error Status Data Block" in the "etc/hardware_errors" blob.
-(8) QEMU defines a third and write-only fw_cfg blob which is called
- "etc/hardware_errors_addr". Through that blob, the firmware can send back
- the guest-side allocation addresses to QEMU. The "etc/hardware_errors_addr"
- blob contains a 8-byte entry. QEMU generates a single WRITE_POINTER command
- for the firmware. The firmware will write back the start address of
- "etc/hardware_errors" blob to the fw_cfg file "etc/hardware_errors_addr".
+(8) QEMU defines a third and write-only fw_cfg blob to store the location
+ where the error block offsets, read ack registers and CPER records are
+ stored.
+
+ Up to QEMU 9.2, the location was at "etc/hardware_errors_addr", and
+ contains an offset for the beginning of "etc/hardware_errors".
+
+ Newer versions place the location at "etc/acpi_table_hest_addr",
+ pointing to the beginning of the HEST table.
+
+ Through that such offsets, the firmware can send back the guest-side
+ allocation addresses to QEMU. They contain a 8-byte entry. QEMU generates
+ a single WRITE_POINTER command for the firmware. The firmware will write
+ back the start address of either "etc/hardware_errors" or HEST table at
+ the correspoinding address firmware.
(9) When QEMU gets a SIGBUS from the kernel, QEMU writes CPER into corresponding
"Error Status Data Block", guest memory, and then injects platform specific
@@ -105,8 +114,6 @@ Design Details
kernel, on receiving notification, guest APEI driver could read the CPER error
and take appropriate action.
-(11) kvm_arch_on_sigbus_vcpu() uses source_id as index in "etc/hardware_errors" to
- find out "Error Status Data Block" entry corresponding to error source. So supported
- source_id values should be assigned here and not be changed afterwards to make sure
- that guest will write error into expected "Error Status Data Block" even if guest was
- migrated to a newer QEMU.
+(11) kvm_arch_on_sigbus_vcpu() report RAS errors via a SEA notifications,
+ when a SIGBUS event is triggered. The logic to convert a SEA notification
+ into a source ID is defined inside ghes.c source file.
Powered by blists - more mailing lists