lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4a7ee633-037f-9064-7e95-9fdcdbfb5fe0@amd.com>
Date:   Tue, 11 Oct 2022 16:13:07 -0700
From:   Smita Koralahalli <Smita.KoralahalliChannabasappa@....com>
To:     Jonathan Cameron <Jonathan.Cameron@...wei.com>
Cc:     linux-cxl@...r.kernel.org, linux-kernel@...r.kernel.org,
        Alison Schofield <alison.schofield@...el.com>,
        Vishal Verma <vishal.l.verma@...el.com>,
        Ira Weiny <ira.weiny@...el.com>,
        Ben Widawsky <bwidawsk@...nel.org>,
        Dan Williams <dan.j.williams@...el.com>,
        Robert Richter <rrichter@....com>,
        Yazen Ghannam <yazen.ghannam@....com>,
        Terry Bowman <terry.bowman@....com>
Subject: Re: [PATCH 1/2] efi/cper, cxl: Decode CXL Protocol Error Section

On 10/10/2022 7:24 AM, Jonathan Cameron wrote:
> On Fri, 7 Oct 2022 21:17:13 +0000
> Smita Koralahalli <Smita.KoralahalliChannabasappa@....com> wrote:
>
>> Add support for decoding CXL Protocol Error Section as defined in UEFI 2.9
>> Section N.2.13.
>>
>> Do the section decoding in a new cper_cxl.c file. This new file will be
>> used in the future for more CXL CPERs decode support. Add this to the
>> existing UEFI_CPER config.
>>
>> Decode only the fields that are useful to parse the error.
>>
>> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@....com>
> Hi Smita,
>
> A few comments inline, but this looks pretty good to me though lots to
> build on top of this (trace events, any in kernel handling necessary etc).
>
> Jonathan

Hi Jonathan,

Thanks for the review comments. Please find my responses/answers inline.

>
>> +#include <linux/cper.h>
>> +#include "cper_cxl.h"
>> +
>> +#define PROT_ERR_VALID_AGENT_TYPE		BIT_ULL(0)
>> +#define PROT_ERR_VALID_AGENT_ADDRESS		BIT_ULL(1)
>> +#define PROT_ERR_VALID_DEVICE_ID		BIT_ULL(2)
>> +#define PROT_ERR_VALID_SERIAL_NUMBER		BIT_ULL(3)
>> +#define PROT_ERR_VALID_CAPABILITY		BIT_ULL(4)
>> +#define PROT_ERR_VALID_DVSEC			BIT_ULL(5)
> Bit 6 has been defined by 2.10 so would be good to add that here as well.
> As below, could be a follow up patch.

Yes, as you may have noticed, I have added this in the next patch of the 
series.

>
>> +
>> +static const char * const prot_err_agent_type_strs[] = {
>> +	"Restricted CXL Device",
>> +	"Restricted CXL Host DP",
> Values up to 7 now defined, so good to list the ones we know...
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fuefi.org%2Fspecs%2FUEFI%2F2.10%2FApx_N_Common_Platform_Error_Record.html%23compute-express-link-cxl-protocol-error-section&amp;data=05%7C01%7CSmita.KoralahalliChannabasappa%40amd.com%7Cbbc0e9642a314f875d9b08daaacb283f%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638010086801792139%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=ZM0zOEVtdxsQtvAxHYrTEW%2FnIBttTK48v2x%2F5VdfQ0U%3D&amp;reserved=0
> even if they aren't going to be generated by your firmware.

Yes, I'm sorry that I missed it. I started my development with the old 
spec..
and did not pay heed to go back and check the latest spec again.

Will add the remaining agent types in the next revision.
Do you recommend adding all of them?
>
> I've hacked qemu to poke out appropriate records in the past
> (for CCIX a while back) and it would be easy enough to add
> them for these cases if needed (obviously that's a dirty hack
> but it works for testing these flows ;)

Yes that would be of a great help to verify all the fields I think.

With our hardware support, I have just verified for CXL 1.1
Downstream Port.

So, I haven't really gotten a chance to verify how Device ID and
Device Serial Number would look like..

>
> I guess I can send a follow up patch if you aren't happy jumping
> directly to the 2.10 version of these structures.

I think, I can add them in the next revision. But I don't think I can 
test them
though :(

>
>> +};
>> +
>> +enum {
>> +	RCD,
>> +	RCH_DP,
>> +};
>> +
>> +void cper_print_prot_err(const char *pfx, const struct cper_sec_prot_err *prot_err)
>> +{
>> +	if (prot_err->valid_bits & PROT_ERR_VALID_AGENT_TYPE)
>> +		pr_info("%s agent_type: %d, %s\n", pfx, prot_err->agent_type,
>> +			prot_err->agent_type < ARRAY_SIZE(prot_err_agent_type_strs)
>> +			? prot_err_agent_type_strs[prot_err->agent_type]
>> +			: "unknown");
>> +
>> +	if (prot_err->valid_bits & PROT_ERR_VALID_AGENT_ADDRESS) {
>> +		switch (prot_err->agent_type) {
>> +		case RCD:
>> +			pr_info("%s agent_address: %04x:%02x:%02x.%x\n",
>> +				pfx, prot_err->segment, prot_err->bus,
>> +				prot_err->device, prot_err->function);
>> +			break;
>> +		case RCH_DP:
>> +			pr_info("%s rcrb_base_address: 0x%016llx\n", pfx,
>> +				prot_err->agent_addr);
>> +			break;
>> +		default:
>> +			break;
>> +		}
>> +	}
>> +
>> +	if (prot_err->valid_bits & PROT_ERR_VALID_DEVICE_ID) {
>> +		const __u8 *class_code;
>> +
>> +		switch (prot_err->agent_type) {
>> +		case RCD:
>> +			pr_info("%s slot: %d\n", pfx,
>> +				prot_err->device_id.slot >> CPER_PCIE_SLOT_SHIFT);
>> +			pr_info("%s vendor_id: 0x%04x, device_id: 0x%04x\n",
>> +				pfx, prot_err->device_id.vendor_id,
>> +				prot_err->device_id.device_id);
>> +			pr_info("%s sub_vendor_id: 0x%04x, sub_device_id: 0x%04x\n",
>> +				pfx, prot_err->device_id.sub_vendor_id,
>> +				prot_err->device_id.sub_device_id);
>> +			class_code = prot_err->device_id.class_code;
>> +			pr_info("%s class_code: %02x%02x\n", pfx,
>> +				class_code[1], class_code[0]);
>> +			break;
>> +		default:
>> +			break;
>> +		}
>> +	}
>> +
>> +	if (prot_err->valid_bits & PROT_ERR_VALID_SERIAL_NUMBER) {
>> +		switch (prot_err->agent_type) {
>> +		case RCD:
>> +			pr_info("%s lower_dw: 0x%08x, upper_dw: 0x%08x\n", pfx,
>> +				prot_err->dev_serial_num.lower_dw,
>> +				prot_err->dev_serial_num.upper_dw);
>> +			break;
>> +		default:
>> +			break;
>> +		}
>> +	}
> Nice to pretty print the cap structure and appropriate DVSEC and Error logs as well
> if valid, but that could be a follow up patch.

Hmm, I have added the Error log decoding support in the next patch.

But, I did not include cap structure and DVSEC though as I initially 
thought it might not be
that important to parse the error. Do you recommend adding them?

>
>> +}
>> diff --git a/drivers/firmware/efi/cper_cxl.h b/drivers/firmware/efi/cper_cxl.h
>> new file mode 100644
>> index 000000000000..f924d96e4bb2
>> --- /dev/null
>> +++ b/drivers/firmware/efi/cper_cxl.h
>> @@ -0,0 +1,58 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +/*
>> + * UEFI Common Platform Error Record (CPER) support for CXL Section.
>> + *
>> + * Copyright (C) 2022 Advanced Micro Devices, Inc.
>> + *
>> + * Author: Smita Koralahalli <Smita.KoralahalliChannabasappa@....com>
>> + */
>> +
>> +#ifndef LINUX_CPER_CXL_H
>> +#define LINUX_CPER_CXL_H
>> +
>> +/* CXL Protocol Error Section */
>> +#define CPER_SEC_CXL_PROT_ERR						\
>> +	GUID_INIT(0x80B9EFB4, 0x52B5, 0x4DE3, 0xA7, 0x77, 0x68, 0x78,	\
>> +		  0x4B, 0x77, 0x10, 0x48)
>> +
>> +#pragma pack(1)
>> +
>> +/* Compute Express Link Protocol Error Section, UEFI v2.9 sec N.2.13 */
> Given 2.10 is out an expands on a few corners of this. Perhaps update
> an switch the reference over to that to cover that.
> I haven't checked, but assume the additional Agent Types were added in 2.10.

Right, will fix the comment to refer to 2.10.

>
>> +struct cper_sec_prot_err {
>> +	u64			valid_bits;
>> +	u8			agent_type;
>> +	u8			reserved[7];
>> +	union {
>> +		u64		agent_addr;
> Perhaps useful to add a few comments to say when the different union
> elements are relevant.  Perhaps also name the field as per the spec
> which would give the overall union the agent_address.
> I admit that is a little confusing with the union element having
> the same name for when it's treated as an address.
> Perhaps call the union element rcrb_base_addr?

Okay, probably something like this?

     union {
                     u64                rcrb_base_addr;
                     struct {
                                     u8    function;
                                     u8    device;
                                     u8    bus;
                                     u16  segment;
                                     u8    reserved_1[3];
                     };
     } agent_addr;

And change printing from prot_err->segment to prot_err->agent_addr.segment
and so on.. Please ignore my indentation/alignments here..

>
>> +		struct {
>> +			u8	function;
>> +			u8	device;
>> +			u8	bus;
>> +			u16	segment;
>> +			u8	reserved_1[3];
>> +		};
>> +	};
>> +	struct {
>> +		u16		vendor_id;	
>> +		u16		device_id;
>> +		u16		sub_vendor_id;
>> +		u16		sub_device_id;
> This is always fun.  Far as I can tell not all PCI elements
> have subsystem IDs - so who knows what goes in these..
> Also, there is wonderfully no such thing as a PCI subsystem device ID.
> It's just called subsystem ID in the PCI spec.

Thanks for correcting this. Will fix.

>
>> +		u8		class_code[2];
> Why treat class code as two u8s?  If doing so, shall
> we name them?  base_class_code, sub_class_code?

Just followed the PCI CPER decoding here.. Will name them if that makes
more sense..

>
>> +		u16		slot;
>> +		u8		reserved_1[4];
>> +	}			device_id;
> I would not align device-id structure name with the contents.
> I'm not personally a big fan of aligning structure elements
> because it often goes wrong, but if you want to do it, drop
> the alignment of device_id by a few levels.

Thats true actually. Will just drop the alignment for all elements in next
revision.

>
> 	} device_id;
>
>> +	struct {
>> +		u32		lower_dw;
>> +		u32		upper_dw;
>> +	}			dev_serial_num;
>> +	u8			capability[60];
>> +	u16			dvsec_len;
>> +	u16			err_len;
>> +	u8			reserved_2[4];
> You could add a [] array on the end to make it clear there are more elements.
> That's not a perfect solution though given there are two different variable length
> fields on the end.  They aren't that variable (as defined by the structures
> in the CXL spec) however the complexity comes from the fact that they may not
> be valid / lengths will be 0 (I assume lengths will be 0
> if not valid anyway, the spec doesn't seem to say either way...)

Hmm, I probably got the idea to do this way by referring to "efi/cper-x86.c"
(N.2.4.2 IA32/X64 Processor Error Section in UEFI spec) and probably at few
places under the APEI tables.

Also to note, defining the new variable array member the parsing needs 
to be changed
accordingly in the next patch.  Add dvsec length to this new variable array
member to get to the Error Log field. Am I right?

>
> Either way, adding a variable array element gives a good way to address the first
> one, and provides a good location for a comment on what else might be here.
>
>> +};
>> +
>> +#pragma pack()
> I would push the structure definition down into the c file and provide a forwards
> definition only in the header.
>
> struct cper_sec_prot_err;

Okay will do that!

Thanks,
Smita

>
>
>> +
>> +void cper_print_prot_err(const char *pfx, const struct cper_sec_prot_err *prot_err);
>> +
>> +#endif //__CPER_CXL_


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ