[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250809175519.74b08ea9@foz.lan>
Date: Sat, 9 Aug 2025 17:55:19 +0200
From: Mauro Carvalho Chehab <mchehab+huawei@...nel.org>
To: Jonathan Cameron <Jonathan.Cameron@...wei.com>
Cc: Daniel Ferguson <danielf@...amperecomputing.com>, Ard Biesheuvel
<ardb@...nel.org>, Jonathan Corbet <corbet@....net>, "Rafael J. Wysocki"
<rafael@...nel.org>, Len Brown <lenb@...nel.org>, "James Morse"
<james.morse@....com>, Tony Luck <tony.luck@...el.com>, "Borislav Petkov"
<bp@...en8.de>, <linux-doc@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, <linux-acpi@...r.kernel.org>,
<linux-efi@...r.kernel.org>, <linux-edac@...r.kernel.org>, Jason Tian
<jason@...amperecomputing.com>, Shengwei Luo <luoshengwei@...wei.com>,
Shiju Jose <shiju.jose@...wei.com>
Subject: Re: [PATCH v4 1/5] RAS: Report all ARM processor CPER information
to userspace
Em Fri, 8 Aug 2025 16:22:09 +0100
Jonathan Cameron <Jonathan.Cameron@...wei.com> escreveu:
> On Tue, 05 Aug 2025 11:35:38 -0700
> Daniel Ferguson <danielf@...amperecomputing.com> wrote:
>
> > From: Jason Tian <jason@...amperecomputing.com>
> >
> > The ARM processor CPER record was added in UEFI v2.6 and remained
> > unchanged up to v2.10.
> >
> > Yet, the original arm_event trace code added by
> >
> > e9279e83ad1f ("trace, ras: add ARM processor error trace event")
> >
> > is incomplete, as it only traces some fields of UAPI 2.6 table N.16, not
> > exporting any information from tables N.17 to N.29 of the record.
> >
> > This is not enough for the user to be able to figure out what has
> > exactly happened or to take appropriate action.
> >
> > According to the UEFI v2.9 specification chapter N2.4.4, the ARM
> > processor error section includes:
> >
> > - several (ERR_INFO_NUM) ARM processor error information structures
> > (Tables N.17 to N.20);
> > - several (CONTEXT_INFO_NUM) ARM processor context information
> > structures (Tables N.21 to N.29);
> > - several vendor specific error information structures. The
> > size is given by Section Length minus the size of the other
> > fields.
> >
> > In addition, it also exports two fields that are parsed by the GHES
> > driver when firmware reports it, e.g.:
> >
> > - error severity
> > - CPU logical index
> >
> > Report all of these information to userspace via a the ARM tracepoint so
> > that userspace can properly record the error and take decisions related
> > to CPU core isolation according to error severity and other info.
> >
> > The updated ARM trace event now contains the following fields:
> >
> > ====================================== =============================
> > UEFI field on table N.16 ARM Processor trace fields
> > ====================================== =============================
> > Validation handled when filling data for
> > affinity MPIDR and running
> > state.
> > ERR_INFO_NUM pei_len
> > CONTEXT_INFO_NUM ctx_len
> > Section Length indirectly reported by
> > pei_len, ctx_len and oem_len
> > Error affinity level affinity
> > MPIDR_EL1 mpidr
> > MIDR_EL1 midr
> > Running State running_state
> > PSCI State psci_state
> > Processor Error Information Structure pei_err - count at pei_len
> > Processor Context ctx_err- count at ctx_len
> > Vendor Specific Error Info oem - count at oem_len
> > ====================================== =============================
> >
> > It should be noted that decoding of tables N.17 to N.29, if needed, will
> > be handled in userspace. That gives more flexibility, as there won't be
> > any need to flood the kernel with micro-architecture specific error
> > decoding.
> >
> > Also, decoding the other fields require a complex logic, and should be
> > done for each of the several values inside the record field. So, let
> > userspace daemons like rasdaemon decode them, parsing such tables and
> > having vendor-specific micro-architecture-specific decoders.
> >
> > [mchehab: modified description, solved merge conflicts and fixed coding style]
> >
> > Fixes: e9279e83ad1f ("trace, ras: add ARM processor error trace event")
> >
>
> Fixes tag is part of the main tag block so no blank line here.
> There are at least some scripts running on the kernel tree that trip
> up on this (and one that moans at the submitter ;)
>
> I'd also add something to explain the SoB sequence for the curious.
>
> > Co-developed-by: Jason Tian <jason@...amperecomputing.com>
>
> Jason's the Author, so shouldn't have a Co-dev tag.
> There is some info on this in
> https://docs.kernel.org/process/submitting-patches.html
My understanding is that all co-authors shall have co-developed-by
and SoB. Anyway, doesn't matter much in practice, I guess.
>
> > Signed-off-by: Jason Tian <jason@...amperecomputing.com>
> > Co-developed-by: Shengwei Luo <luoshengwei@...wei.com>
> > Signed-off-by: Shengwei Luo <luoshengwei@...wei.com>
> > Co-developed-by: Daniel Ferguson <danielf@...amperecomputing.com>
> > Signed-off-by: Daniel Ferguson <danielf@...amperecomputing.com>
>
> As person submitting I'd normally expect your SoB last.
>
> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@...nel.org>
>
> I guess this is because Mauro posted an earlier version in which
> case this is arguably correct, but likely to confuse.
> For cases like this I add comments.
If the patch is identical, and it is just a resubmission,
I would keep the original order.
Otherwise, if Daniel did some changes at the code (except for a
trivial rebase stuff), better to move his co-developed-by/SoB to
the end, eventually adding:
[Daniel: <did something>] before the custody chain block.
Thanks,
Mauro
Powered by blists - more mailing lists