[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250123192350.6db42016@foz.lan>
Date: Thu, 23 Jan 2025 19:23:50 +0100
From: Mauro Carvalho Chehab <mchehab+huawei@...nel.org>
To: Jonathan Cameron <Jonathan.Cameron@...wei.com>
Cc: Igor Mammedov <imammedo@...hat.com>, "Michael S . Tsirkin"
<mst@...hat.com>, Shiju Jose <shiju.jose@...wei.com>,
<qemu-arm@...gnu.org>, <qemu-devel@...gnu.org>, Ani Sinha
<anisinha@...hat.com>, Dongjiu Geng <gengdongjiu1@...il.com>,
<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 03/11] acpi/ghes: Use HEST table offsets when preparing
GHES records
Em Thu, 23 Jan 2025 10:29:19 +0000
Jonathan Cameron <Jonathan.Cameron@...wei.com> escreveu:
> On Wed, 22 Jan 2025 16:46:20 +0100
> Mauro Carvalho Chehab <mchehab+huawei@...nel.org> wrote:
>
> > There are two pointers that are needed during error injection:
> >
> > 1. The start address of the CPER block to be stored;
> > 2. The address of the ack, which needs a reset before next error.
> >
> > It is preferable to calculate them from the HEST table. This allows
> > checking the source ID, the size of the table and the type of the
> > HEST error block structures.
> >
> > Yet, keep the old code, as this is needed for migration purposes.
> >
> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@...nel.org>
> Generally looks good. A few bits that I think could be made
> easier to follow for anyone with the spec open in front of them.
>
> Thanks,
>
> Jonathan
>
> > ---
> > hw/acpi/ghes.c | 98 ++++++++++++++++++++++++++++++++++++++++++++------
> > 1 file changed, 88 insertions(+), 10 deletions(-)
> >
> > diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> > index 34e3364d3fd8..b46b563bcaf8 100644
> > --- a/hw/acpi/ghes.c
> > +++ b/hw/acpi/ghes.c
> > @@ -61,6 +61,23 @@
> > */
> > #define ACPI_GHES_GESB_SIZE 20
> >
> > +/*
> > + * Offsets with regards to the start of the HEST table stored at
> > + * ags->hest_addr_le, according with the memory layout map at
> > + * docs/specs/acpi_hest_ghes.rst.
> > + */
> > +
> > +/* ACPI 6.2: 18.3.2.8 Generic Hardware Error Source version 2
>
> Local multiline comment style seems to be always
> /*
> * ACPI 6.2:...
>
> So perhaps good to copy that.
>
> > + * Table 18-382 Generic Hardware Error Source version 2 (GHESv2) Structure
> > + */
> > +#define HEST_GHES_V2_TABLE_SIZE 92
> > +#define GHES_ACK_OFFSET (64 +
)
>
> Using a GAS offset here to me obscures what is going on. I'd
> explicitly handle the GAS where you are reading this.
> We probably should sanity check the type as there are
> some crazy options that might turn up one day.
See below.
> Maybe worth using spec term of
> GHES_READ_ACK_...
>
> Obviously it's a question of who you are for whether it is read or
> write, but maybe still worth using that term for easy checking
> against the specification.
>
> > +
> > +/* ACPI 6.2: 18.3.2.7: Generic Hardware Error Source
> same on comment format.
>
> > + * Table 18-380: 'Error Status Address' field
> > + */
> > +#define GHES_ERR_ST_ADDR_OFFSET (20 + GAS_ADDR_OFFSET)
> Maybe STS or spell out status? I found ST a bit confusing below.
Giving names is not an easy task... Removing _ST doesn't sound
right, as everything is GHES_ERR. STS sounds weird to me as well.
Maybe we could name them both as something like:
GHES_ERR_STATUS_ADDR_OFF
GHES_READ_ACK_ADDR_OFF
> > +
> > /*
> > * Values for error_severity field
> > */
> > @@ -212,14 +229,6 @@ static void build_ghes_error_table(GArray *hardware_errors, BIOSLinker *linker,
> > {
> > int i, error_status_block_offset;
> >
> > - /*
> > - * TODO: Current version supports only one source.
> > - * A further patch will drop this check, after adding a proper migration
> > - * code, as, for the code to work, we need to store a bios pointer to the
> > - * HEST table.
> > - */
> > - assert(num_sources == 1);
> > -
> > /* Build error_block_address */
> > for (i = 0; i < num_sources; i++) {
> > build_append_int_noprefix(hardware_errors, 0, sizeof(uint64_t));
> > @@ -419,6 +428,70 @@ static void get_hw_error_offsets(uint64_t ghes_addr,
> > *read_ack_register_addr = ghes_addr + sizeof(uint64_t);
> > }
> >
> > +static void get_ghes_source_offsets(uint16_t source_id, uint64_t hest_addr,
> > + uint64_t *cper_addr,
> > + uint64_t *read_ack_start_addr,
> > + Error **errp)
> > +{
> > + uint64_t hest_err_block_addr, hest_read_ack_addr;
> > + uint64_t err_source_struct, error_block_addr;
> > + uint32_t num_sources, i;
> > +
> > + if (!hest_addr) {
> I guess it is a question of matching local code, but I'd be tempted
> to name this hest_body_addr as it isn't the start of the table but
> rather the bit after the header.
This is named after hest_addr_le, which in turn was named after ghes_hw_le.
Besides, I guess such name was suggested on a past review. From my side,
I'm OK with any name you/Igor pick.
>
> > + return;
> > + }
> > +
> > + cpu_physical_memory_read(hest_addr, &num_sources, sizeof(num_sources));
>
> The hest_addr naming thing confused me a tiny bit here because obviously num_sources
> isn't the first thing in the table in the spec!
>
> > + num_sources = le32_to_cpu(num_sources);
> > +
> > + err_source_struct = hest_addr + sizeof(num_sources);
> > +
> > + /*
> > + * Currently, HEST Error source navigates only for GHESv2 tables
> > + */
> > +
> > + for (i = 0; i < num_sources; i++) {
> > + uint64_t addr = err_source_struct;
> > + uint16_t type, src_id;
> > +
> > + cpu_physical_memory_read(addr, &type, sizeof(type));
> > + type = le16_to_cpu(type);
> > +
> > + /* For now, we only know the size of GHESv2 table */
> > + if (type != ACPI_GHES_SOURCE_GENERIC_ERROR_V2) {
> > + error_setg(errp, "HEST: type %d not supported.", type);
> > + return;
>
> It's a pity we can't just skip them, but absence of a size field
> makes that tricky... Can add that later I guess along with sizes
> for each defined type including figuring out the variable length
> ones like IA-32 machine check. I guess this is why the whole ordering
> constraint for new types was added. Can't find the old ones if
> we don't know the size of the new ones, hence need new definitions
> at the end.
Yes. The variable sizes makes it harder to parse with current GHES
types. It sounds they'll fix it for the next types, as the size of
the record will be stored for types above 11.
So, at the end, we'll need to add a much more complex logic if/when
we add non-GHES records.
>
> Anyhow, I'm fine with this but maybe a little more description in the comment
> would avoid someone going down same rat hole I just did.
>
>
> > + }
> > +
> > + /* Compare CPER source address at the GHESv2 structure */
> > + addr += sizeof(type);
> > + cpu_physical_memory_read(addr, &src_id, sizeof(src_id));
> > +
> > + if (src_id == source_id) {
> > + break;
> > + }
> > +
> > + err_source_struct += HEST_GHES_V2_TABLE_SIZE;
> > + }
> > + if (i == num_sources) {
> > + error_setg(errp, "HEST: Source %d not found.", source_id);
> > + return;
> > + }
> > +
> > + /* Navigate though table address pointers */
> > + hest_err_block_addr = err_source_struct + GHES_ERR_ST_ADDR_OFFSET;
>
> So this is a bit confusing. I'd pull the GAS offset down here rather
> than putting it in the define. That way we can clearly see you
> are grabbing the address field. As above, should we check the type
> is 0x00? There are many fun things it could be but here I think
> we just want it to be memory space.
In short:
The type was already ensured when HEST table is built. I can't see
any need to add extra checks. If you want this to be better documented,
we could just do:
hest_err_block_addr = err_source_struct + GHES_ERR_STATUS_ADDR_OFF + GAS_OFFSET;
It follows a longer rationale:
If I understood well, and after some discussions we had today via chat,
you basically want to add an additional check logic during error inject
to check if the memory type filled at build_ghes_v2() here:
/* Build Generic Hardware Error Source version 2 (GHESv2) */
static void build_ghes_v2(GArray *table_data,
BIOSLinker *linker,
const AcpiNotificationSourceId *notif_src,
uint16_t index, int num_sources)
{
...
/* Error Status Address */
build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0,
...
/*
* Read Ack Register
* ACPI 6.1: 18.3.2.8 Generic Hardware Error Source
* version 2 (GHESv2 - Type 10)
*/
build_append_gas(table_data, AML_AS_SYSTEM_MEMORY, 0x40, 0,
4 /* QWord access */, 0);
...
}
was not modified and still remains AML_AS_SYSTEM_MEMORY, as otherwise the
code at get_ghes_source_offsets() won't be able to use
cpu_physical_memory_read/cpu_physical_memory_write. To make it right,
IMO we would need to add something like this to aml-build.c:
int verify_gas_addr_space(uint64_t addr, AmlAddressSpace as)
{
uint64_t gas_header;
cpu_physical_memory_read(addr, &gas_header, 4);
gas_header = cpu_to_le64(0);
if ((gas_header & 0xff) != as)
return 1;
return 0;
}
and at ghes.c do something like:
// Using current names just to better illustrate the changes
#define GHES_ACK_OFFSET 64 // don't add GAS_ADDR_OFFSET here
#define GHES_ERR_ST_ADDR_OFFSET 20 // don't add GAS_ADDR_OFFSET here
static void get_ghes_source_offsets(uint16_t source_id, uint64_t hest_addr,
uint64_t *cper_addr,
uint64_t *read_ack_start_addr,
Error **errp)
{
/* Navigate though table address pointers */
hest_err_block_addr = err_source_struct + GHES_ERR_ST_ADDR_OFFSET;
/* EXTRA CHECK LOGIC: Verify if build_ghes_v2() did his job */
if (verify_gas_addr_space(hest_err_block_addr, AML_AS_SYSTEM_MEMORY)} {
error_setg(errp, "HEST firmware is not using system memory!!!");
return;
}
hest_err_block_addr += GAS_ADDR_OFFSET;
hest_read_ack_addr = err_source_struct + GHES_ACK_OFFSET;
/* EXTRA CHECK LOGIC: Verify if build_ghes_v2() did his job */
if (verify_gas_addr_space(hest_read_ack_addr, AML_AS_SYSTEM_MEMORY)} {
error_setg(errp, "HEST firmware is not using system memory!!!");
return;
}
hest_read_ack_addr += GAS_ADDR_OFFSET;
cpu_physical_memory_read(hest_err_block_addr, &error_block_addr,
sizeof(error_block_addr));
cpu_physical_memory_read(error_block_addr, cper_addr,
sizeof(*cper_addr));
cpu_physical_memory_read(hest_read_ack_addr, read_ack_start_addr,
sizeof(*read_ack_start_addr));
}
IMO, this is overkill:
- I can't see how such check will ever be triggered in practice;
- I can't see any reason why the HEST firmware would ever be stored
on a non-system memory: firmware should always be at
AML_AS_SYSTEM_MEMORY;
- As those offsets are related to fw_cfg, any change there would
require changing the firmware binding logic. Plus, they'll very
likely also require changes at BIOS code itself, as it would need
to know how to store firmware files on some other memory type;
- Any change like that will certainly require adding backport support,
as QEMU would need to check if BIOS would support different types
of memory to store firmware instead of system memory. Also, QEMU 9.1
is only compatible with firmware stored on system's memory.
So, I don't see any benefit of doing that.
> > + hest_read_ack_addr = err_source_struct + GHES_ACK_OFFSET;
>
> Perhaps move this down to above where it is used?
> Same thing about GAS address offset being better found down here
> rather than hidden in GHES_ACK_OFFSET.
>
> > +
> > + cpu_physical_memory_read(hest_err_block_addr, &error_block_addr,
> > + sizeof(error_block_addr));
> > +
> > + cpu_physical_memory_read(error_block_addr, cper_addr,
> > + sizeof(*cper_addr));
> > +
> > + cpu_physical_memory_read(hest_read_ack_addr, read_ack_start_addr,
> > + sizeof(*read_ack_start_addr));
> > +}
> > +
> > void ghes_record_cper_errors(const void *cper, size_t len,
> > uint16_t source_id, Error **errp)
> > {
> > @@ -439,8 +512,13 @@ void ghes_record_cper_errors(const void *cper, size_t len,
> > }
> > ags = &acpi_ged_state->ghes_state;
> >
> > - get_hw_error_offsets(le64_to_cpu(ags->hw_error_le),
> > - &cper_addr, &read_ack_register_addr);
> > + if (!ags->hest_addr_le) {
> > + get_hw_error_offsets(le64_to_cpu(ags->hw_error_le),
> > + &cper_addr, &read_ack_register_addr);
> > + } else {
> > + get_ghes_source_offsets(source_id, le64_to_cpu(ags->hest_addr_le),
> > + &cper_addr, &read_ack_register_addr, errp);
> > + }
> >
> > if (!cper_addr) {
> > error_setg(errp, "can not find Generic Error Status Block");
>
Thanks,
Mauro
Powered by blists - more mailing lists