[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <a302543c-7a99-4ea4-9559-e4cf4ea79b5e@os.amperecomputing.com>
Date: Mon, 11 Aug 2025 15:37:18 -0700
From: Daniel Ferguson <danielf@...amperecomputing.com>
To: Jonathan Cameron <Jonathan.Cameron@...wei.com>,
Mauro Carvalho Chehab <mchehab+huawei@...nel.org>
Cc: Ard Biesheuvel <ardb@...nel.org>, Jonathan Corbet <corbet@....net>,
"Rafael J. Wysocki" <rafael@...nel.org>, Len Brown <lenb@...nel.org>,
James Morse <james.morse@....com>, Tony Luck <tony.luck@...el.com>,
Borislav Petkov <bp@...en8.de>, linux-doc@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-acpi@...r.kernel.org,
linux-efi@...r.kernel.org, linux-edac@...r.kernel.org,
Jason Tian <jason@...amperecomputing.com>,
Shengwei Luo <luoshengwei@...wei.com>, Shiju Jose <shiju.jose@...wei.com>
Subject: Re: [PATCH v4 1/5] RAS: Report all ARM processor CPER information to
userspace
On 8/11/2025 3:52 AM, Jonathan Cameron wrote:
> On Sat, 9 Aug 2025 17:55:19 +0200
> Mauro Carvalho Chehab <mchehab+huawei@...nel.org> wrote:
>
>> Em Fri, 8 Aug 2025 16:22:09 +0100
>> Jonathan Cameron <Jonathan.Cameron@...wei.com> escreveu:
>>
>>> On Tue, 05 Aug 2025 11:35:38 -0700
>>> Daniel Ferguson <danielf@...amperecomputing.com> wrote:
>>>
>>>> From: Jason Tian <jason@...amperecomputing.com>
>>>>
>>>> The ARM processor CPER record was added in UEFI v2.6 and remained
>>>> unchanged up to v2.10.
>>>>
>>>> Yet, the original arm_event trace code added by
>>>>
>>>> e9279e83ad1f ("trace, ras: add ARM processor error trace event")
>>>>
>>>> is incomplete, as it only traces some fields of UAPI 2.6 table N.16, not
>>>> exporting any information from tables N.17 to N.29 of the record.
>>>>
>>>> This is not enough for the user to be able to figure out what has
>>>> exactly happened or to take appropriate action.
>>>>
>>>> According to the UEFI v2.9 specification chapter N2.4.4, the ARM
>>>> processor error section includes:
>>>>
>>>> - several (ERR_INFO_NUM) ARM processor error information structures
>>>> (Tables N.17 to N.20);
>>>> - several (CONTEXT_INFO_NUM) ARM processor context information
>>>> structures (Tables N.21 to N.29);
>>>> - several vendor specific error information structures. The
>>>> size is given by Section Length minus the size of the other
>>>> fields.
>>>>
>>>> In addition, it also exports two fields that are parsed by the GHES
>>>> driver when firmware reports it, e.g.:
>>>>
>>>> - error severity
>>>> - CPU logical index
>>>>
>>>> Report all of these information to userspace via a the ARM tracepoint so
>>>> that userspace can properly record the error and take decisions related
>>>> to CPU core isolation according to error severity and other info.
>>>>
>>>> The updated ARM trace event now contains the following fields:
>>>>
>>>> ====================================== =============================
>>>> UEFI field on table N.16 ARM Processor trace fields
>>>> ====================================== =============================
>>>> Validation handled when filling data for
>>>> affinity MPIDR and running
>>>> state.
>>>> ERR_INFO_NUM pei_len
>>>> CONTEXT_INFO_NUM ctx_len
>>>> Section Length indirectly reported by
>>>> pei_len, ctx_len and oem_len
>>>> Error affinity level affinity
>>>> MPIDR_EL1 mpidr
>>>> MIDR_EL1 midr
>>>> Running State running_state
>>>> PSCI State psci_state
>>>> Processor Error Information Structure pei_err - count at pei_len
>>>> Processor Context ctx_err- count at ctx_len
>>>> Vendor Specific Error Info oem - count at oem_len
>>>> ====================================== =============================
>>>>
>>>> It should be noted that decoding of tables N.17 to N.29, if needed, will
>>>> be handled in userspace. That gives more flexibility, as there won't be
>>>> any need to flood the kernel with micro-architecture specific error
>>>> decoding.
>>>>
>>>> Also, decoding the other fields require a complex logic, and should be
>>>> done for each of the several values inside the record field. So, let
>>>> userspace daemons like rasdaemon decode them, parsing such tables and
>>>> having vendor-specific micro-architecture-specific decoders.
>>>>
>>>> [mchehab: modified description, solved merge conflicts and fixed coding style]
>>>>
>>>> Fixes: e9279e83ad1f ("trace, ras: add ARM processor error trace event")
>>>>
>>>
>>> Fixes tag is part of the main tag block so no blank line here.
>>> There are at least some scripts running on the kernel tree that trip
>>> up on this (and one that moans at the submitter ;)
>>>
>>> I'd also add something to explain the SoB sequence for the curious.
>>>
>>>> Co-developed-by: Jason Tian <jason@...amperecomputing.com>
>>>
>>> Jason's the Author, so shouldn't have a Co-dev tag.
>>> There is some info on this in
>>> https://docs.kernel.org/process/submitting-patches.html
>>
>> My understanding is that all co-authors shall have co-developed-by
>> and SoB. Anyway, doesn't matter much in practice, I guess.
>
> Nope. In the description the docs say "in addition to the author
> attribute in the From: tag" There are also examples where there
> isn't a Co-dev for the From author including the subtle question of
> ordering if someone else posts that patch.
>
> I have a vague recollection one of the scripts checking linux-next
> might moan about this.
>
>>
>>>
>>>> Signed-off-by: Jason Tian <jason@...amperecomputing.com>
>>>> Co-developed-by: Shengwei Luo <luoshengwei@...wei.com>
>>>> Signed-off-by: Shengwei Luo <luoshengwei@...wei.com>
>>>> Co-developed-by: Daniel Ferguson <danielf@...amperecomputing.com>
>>>> Signed-off-by: Daniel Ferguson <danielf@...amperecomputing.com>
>>>
>>> As person submitting I'd normally expect your SoB last.
>>>
>>>> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@...nel.org>
>>>
>>> I guess this is because Mauro posted an earlier version in which
>>> case this is arguably correct, but likely to confuse.
>>> For cases like this I add comments.
>>
>> If the patch is identical, and it is just a resubmission,
>> I would keep the original order.
>>
>> Otherwise, if Daniel did some changes at the code (except for a
>> trivial rebase stuff), better to move his co-developed-by/SoB to
>> the end, eventually adding:
>>
>> [Daniel: <did something>] before the custody chain block.
>
> Docs are clear that sender of the patch must be last SoB.
> That's also checked by the some of the tag block check scripts
> I think. There's an example of exactly this case in the in the
> submitting-patches.rst file (last one in the section that talks
> about Co-developed-by.
>
> For meaning I don't care that much, but keeping to the rigid
> rules in that doc makes it easier for scripts to check for the
> more important stuff like whether all necessary SoB are there.
>
> Jonathan
>
Just to make sure I get this right...
I will move my SoB and my Co-developed-by after any other SoB's in the tag
block. As a result, I do not need to provide an explanation in this case,
because there is nothing curious remaining ? Or should I still add comments
indicating why I rearranged the tag block ?
Based on your feedback, I'm intending to organize the tag block to look like this:
Signed-off-by: Jason Tian <jason@...amperecomputing.com>
Co-developed-by: Shengwei Luo <luoshengwei@...wei.com>
Signed-off-by: Shengwei Luo <luoshengwei@...wei.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@...nel.org>
Co-developed-by: Daniel Ferguson <danielf@...amperecomputing.com>
Signed-off-by: Daniel Ferguson <danielf@...amperecomputing.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@...wei.com>
Tested-by: Shiju Jose <shiju.jose@...wei.com>
Acked-by: Borislav Petkov (AMD) <bp@...en8.de>
Fixes: e9279e83ad1f ("trace, ras: add ARM processor error trace event")
Link:
https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#arm-processor-error-section
Is that satisfactory?
Thank you,
Daniel
>>
>> Thanks,
>> Mauro
>>
>
Powered by blists - more mailing lists