[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20240801163412.48740ddb@foz.lan>
Date: Thu, 1 Aug 2024 16:34:12 +0200
From: Mauro Carvalho Chehab <mchehab+huawei@...nel.org>
To: Markus Armbruster <armbru@...hat.com>
Cc: Jonathan Cameron <Jonathan.Cameron@...wei.com>, Shiju Jose
<shiju.jose@...wei.com>, "Michael S. Tsirkin" <mst@...hat.com>, Ani Sinha
<anisinha@...hat.com>, Dongjiu Geng <gengdongjiu1@...il.com>, Eric Blake
<eblake@...hat.com>, Igor Mammedov <imammedo@...hat.com>, Michael Roth
<michael.roth@....com>, Paolo Bonzini <pbonzini@...hat.com>, Peter Maydell
<peter.maydell@...aro.org>, linux-kernel@...r.kernel.org,
qemu-arm@...gnu.org, qemu-devel@...gnu.org
Subject: Re: [PATCH v3 4/7] acpi/ghes: Add a logic to handle block addresses
and FW first ARM processor error injection
Em Mon, 29 Jul 2024 16:32:41 +0200
Markus Armbruster <armbru@...hat.com> escreveu:
> Yes, as this CPER record is defined only for arm. There are three other
> > processor error info:
> > - for x86;
> > - for ia32;
> > - for "generic cpu".
> >
> > They have different structures, with different fields.
>
> A generic inject-error command feels nicer, but coding its arguments in
> the schema could be more trouble than it's worth. I'm not asking you to
> try.
>
> A target-specific command like this one should be conditional. Try
> this:
>
> { 'command': 'arm-inject-error',
> 'data': { 'errortypes': ['ArmProcessorErrorType'] },
> 'features': [ 'unstable' ],
> 'if': 'TARGET_ARM' }
>
> No need to provide a qmp_arm_inject_error() stub then.
I tried it, but it generates lots of poison errors. Basically, QAPI
generation includes poison.h, making it to complain about on
non-ARM builds.
Anyway, the new version I'm about to submit is not dependent on
ARM anymore (as it is a generic GHES error injection that can be used
by any arch).
Still, as I added a Kconfig symbol for it, I still needed a stub.
It would be cool not needing it, but on the other hand it doesn't
hurt much.
> >> If we encode the the error to inject as an enum value, adding more will
> >> be hard.
> >>
> >> If we wrap the enum in a struct
> >>
> >> { 'struct': 'ArmProcessorError',
> >> 'data': { 'type': 'ArmProcessorErrorType' } }
> >>
> >> we can later extend it like
> >>
> >> { 'union': 'ArmProcessorError',
> >> 'base: { 'type': 'ArmProcessorErrorType' }
> >> 'data': {
> >> 'bus-error': 'ArmProcessorBusErrorData' } }
> >>
> >> { 'struct': 'ArmProcessorBusErrorData',
> >> 'data': ... }
> >
> > I don't see this working as one might expect. See, the ARM error
> > information data can be repeated from 1 to 255 times. It is given
> > by this struct (see patch 7):
> >
> > { 'struct': 'ArmProcessorErrorInformation',
> > 'data': { '*validation': ['ArmPeiValidationBits'],
> > 'type': ['ArmProcessorErrorType'],
> > '*multiple-error': 'uint16',
> > '*flags': ['ArmProcessorFlags'],
> > '*error-info': 'uint64',
> > '*virt-addr': 'uint64',
> > '*phy-addr': 'uint64'}
> > }
> >
> > According with the UEFI spec, the type is always be present.
> > The other fields are marked as valid or not via the field
> > "validation". So, there's one bit indicating what is valid between
> > the fields at the PEI structure, e. g.:
> >
> > - multiple-error: multiple occurrences of the error;
> > - flags;
> > - error-info: error information;
> > - virt-addr: virtual address;
> > - phy-addr: physical address.
> >
> > There are also other fields that are global for the entire record,
> > also marked as valid or not via another bitmask.
> >
> > The contents of almost all those fields are independent of the error
> > type. The only field which content is affected by the error type is
> > "error-info", and the definition of such field is not fully specified.
> >
> > So, currently, UEFI spec only defines it when:
> >
> > 1. the error type has just one bit set;
> > 2. the error type is either cache, TLB or bus error[1].
> > If type is micro-arch-specific error, the spec doesn't tell how this
> > field if filled.
> >
> > To make the API simple (yet powerful), I opted to not enforce any encoding
> > for error-info: let userspace fill it as required and use some default
> > that would make sense, if this is not passed via QMP.
> >
> > [1] See https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#arm-processor-error-information
>
> I asked because designing for extensibility is good practice.
>
> It's not a hard requirement here, because feature 'unstable' gives us
> lincense to change the interface incompatibly.
IMO keeping it as unstable makes sense, as this QAPI is specific for
error injection, which is hardly a feature widely used. Also, with the
script approach, the actual CPER record generation happens on a script.
If we provide it together with QEMU, if the QAPI ever changes, the
changes inside the script will happen altogether. So, IMO, no need to
make it stable.
Thanks,
Mauro
Powered by blists - more mailing lists