lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <180fcfd623c64cdb86cdc9059f749af0@huawei.com>
Date: Tue, 26 Nov 2024 11:51:23 +0000
From: Shiju Jose <shiju.jose@...wei.com>
To: "dave.jiang@...el.com" <dave.jiang@...el.com>, "dan.j.williams@...el.com"
	<dan.j.williams@...el.com>, Jonathan Cameron <jonathan.cameron@...wei.com>,
	"alison.schofield@...el.com" <alison.schofield@...el.com>,
	"nifan.cxl@...il.com" <nifan.cxl@...il.com>, "vishal.l.verma@...el.com"
	<vishal.l.verma@...el.com>, "ira.weiny@...el.com" <ira.weiny@...el.com>,
	"dave@...olabs.net" <dave@...olabs.net>, "linux-cxl@...r.kernel.org"
	<linux-cxl@...r.kernel.org>
CC: "rostedt@...dmis.org" <rostedt@...dmis.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, Linuxarm
	<linuxarm@...wei.com>, tanxiaofei <tanxiaofei@...wei.com>, "Zengtao (B)"
	<prime.zeng@...ilicon.com>
Subject: RE: [PATCH v4 3/6] cxl/events: Update General Media Event Record to
 CXL spec rev 3.1

CC  Steven Rostedt.

Hi Steven,

We are encountering a parsing error ("FAILED TO PARSE") from libtraceevent  when it
tries to parse some of the CXL trace events for the user-space tool rasdaemon. 
This issue appeared after new fields were added to the trace events. 
It was found that the issue does not occur when all or some of the decoded strings
for the event's data and flags are removed from the TP_printk() function in the kernel,
and only the values are printed instead.
https://elixir.bootlin.com/linux/v6.12/source/drivers/cxl/core/trace.h
https://lore.kernel.org/lkml/20241120093745.1847-1-shiju.jose@huawei.com/

Below is the information from the debugging in libtraceevent:
The failure occurs in the following functions and locations within libtraceevent:
File: src/event-parse.c
Function: event_read_format()
ret = event_read_fields(event->tep, event, &event->format.fields); if (ret < 0)
    return ret;

Function: event_read_fields()
if (test_type_token(type, token, TEP_EVENT_ITEM, "field"))
    goto fail;

Can you recognize if there are any limitations or issues  that would prevent
libtraceevent from parsing the trace event in the condition described above?

Thanks,
Shiju

>-----Original Message-----
>From: Shiju Jose <shiju.jose@...wei.com>
>Sent: 20 November 2024 09:38
>To: dave.jiang@...el.com; dan.j.williams@...el.com; Jonathan Cameron
><jonathan.cameron@...wei.com>; alison.schofield@...el.com;
>nifan.cxl@...il.com; vishal.l.verma@...el.com; ira.weiny@...el.com;
>dave@...olabs.net; linux-cxl@...r.kernel.org
>Cc: linux-kernel@...r.kernel.org; Linuxarm <linuxarm@...wei.com>;
>tanxiaofei <tanxiaofei@...wei.com>; Zengtao (B) <prime.zeng@...ilicon.com>;
>Shiju Jose <shiju.jose@...wei.com>
>Subject: [PATCH v4 3/6] cxl/events: Update General Media Event Record to CXL
>spec rev 3.1
>
>From: Shiju Jose <shiju.jose@...wei.com>
>
>CXL spec rev 3.1 section 8.2.9.2.1.1 Table 8-45, General Media Event Record has
>updated with following new fields and new types for Memory Event Type and
>Transaction Type fields.
>1. Advanced Programmable Corrected Memory Error Threshold Event Flags 2.
>Corrected Memory Error Count at Event 3. Memory Event Sub-Type
>
>The format of component identifier has changed (CXL spec 3.1 section
>8.2.9.2.1 Table 8-44).
>
>Update the general media event record and general media trace event for the
>above spec changes. The new fields are inserted in logical places.
>
>Example trace log of cxl_general_media trace event,
>
>cxl_general_media: memdev=mem0 host=0000:0f:00.0 serial=3 log=Fatal : \
>time=45104947948 uuid=fbcd0a77-c260-417f-85a9-088b1621eba6 len=128 \
>flags='0x1' handle=1 related_handle=0 maint_op_class=2 \
>maint_op_sub_class=4 : dpa=0x30d40 dpa_flags=0x0 \
>descriptor='UNCORRECTABLE_EVENT|THRESHOLD_EVENT|POISON_LIST_OVER
>FLOW' \ type='TE State Violation' sub_type=0x2 transaction_type=0x4 channel=3
>\
>rank=33 device=0x5 validity_flags=0x1f \
>comp_id=03 74 c5 08 9a 1a 0b fc d2 7e 2f 31 9b 3c 81 4d \
>pldm_entity_id=74 c5 08 9a 1a 0b pldm_resource_id=fc d2 7e 2f \
>hpa=0xffffffffffffffff region= region_uuid=00000000-0000-0000-0000-
>000000000000 \
>cme_threshold_ev_flags=0x3 cme_count=0x78
>
>The number of decoded strings in TP_printk() caused  parsing error when
>libtraceevent in userspace parses the CXL general media trace event for
>rasdaemon. It was found that long decoded strings of field values in the
>TP_printk() caused the issue. As a solution, decoding of some fields in the
>TP_printk() were removed to accommodate the new fields.
>Decoding of all these fields is added in the userspace tool rasdaemon.
>
>Reviewed-by: Jonathan Cameron <Jonathan.Cameron@...wei.com>
>Reviewed-by: Davidlohr Bueso <dave@...olabs.net>
>Signed-off-by: Shiju Jose <shiju.jose@...wei.com>
>---
> drivers/cxl/core/trace.h | 58 +++++++++++++++++++++++++++++-----------
> include/cxl/event.h      |  7 +++--
> 2 files changed, 48 insertions(+), 17 deletions(-)
>
>diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h index
>8e9d80e34a28..77055d66b56e 100644
>--- a/drivers/cxl/core/trace.h
>+++ b/drivers/cxl/core/trace.h
>@@ -287,7 +287,7 @@ TRACE_EVENT(cxl_generic_event,
>
> /*
>  * General Media Event Record - GMER
>- * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
>+ * CXL rev 3.1 Section 8.2.9.2.1.1; Table 8-45
>  */
> #define CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT		BIT(0)
> #define CXL_GMER_EVT_DESC_THRESHOLD_EVENT		BIT(1)
>@@ -301,10 +301,18 @@ TRACE_EVENT(cxl_generic_event,
> #define CXL_GMER_MEM_EVT_TYPE_ECC_ERROR			0x00
> #define CXL_GMER_MEM_EVT_TYPE_INV_ADDR			0x01
> #define CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR		0x02
>-#define show_gmer_mem_event_type(type)	__print_symbolic(type,
>		\
>-	{ CXL_GMER_MEM_EVT_TYPE_ECC_ERROR,		"ECC Error" },
>		\
>-	{ CXL_GMER_MEM_EVT_TYPE_INV_ADDR,		"Invalid
>Address" },	\
>-	{ CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR,	"Data Path
>Error" }	\
>+#define CXL_GMER_MEM_EVT_TYPE_TE_STATE_VIOLATION	0x03
>+#define CXL_GMER_MEM_EVT_TYPE_SCRUB_MEDIA_ECC_ERROR	0x04
>+#define CXL_GMER_MEM_EVT_TYPE_AP_CME_COUNTER_EXPIRE	0x05
>+#define CXL_GMER_MEM_EVT_TYPE_CKID_VIOLATION		0x06
>+#define show_gmer_mem_event_type(type)	__print_symbolic(type,
>			\
>+	{ CXL_GMER_MEM_EVT_TYPE_ECC_ERROR,		"ECC Error" },
>			\
>+	{ CXL_GMER_MEM_EVT_TYPE_INV_ADDR,		"Invalid
>Address" },		\
>+	{ CXL_GMER_MEM_EVT_TYPE_DATA_PATH_ERROR,	"Data Path
>Error" },		\
>+	{ CXL_GMER_MEM_EVT_TYPE_TE_STATE_VIOLATION,	"TE State
>Violation" },		\
>+	{ CXL_GMER_MEM_EVT_TYPE_SCRUB_MEDIA_ECC_ERROR,	"Scrub
>Media ECC Error" },	\
>+	{ CXL_GMER_MEM_EVT_TYPE_AP_CME_COUNTER_EXPIRE,	"Adv
>Prog CME Counter Expiration" },	\
>+	{ CXL_GMER_MEM_EVT_TYPE_CKID_VIOLATION,		"CKID
>Violation" }		\
> )
>
> #define CXL_GMER_TRANS_UNKNOWN				0x00
>@@ -314,6 +322,8 @@ TRACE_EVENT(cxl_generic_event,
> #define CXL_GMER_TRANS_HOST_INJECT_POISON		0x04
> #define CXL_GMER_TRANS_INTERNAL_MEDIA_SCRUB		0x05
> #define CXL_GMER_TRANS_INTERNAL_MEDIA_MANAGEMENT	0x06
>+#define CXL_GMER_TRANS_INTERNAL_MEDIA_ECS		0x07
>+#define CXL_GMER_TRANS_MEDIA_INITIALIZATION		0x08
> #define show_trans_type(type)	__print_symbolic(type,
>		\
> 	{ CXL_GMER_TRANS_UNKNOWN,			"Unknown" },
>			\
> 	{ CXL_GMER_TRANS_HOST_READ,			"Host Read" },
>			\
>@@ -321,18 +331,22 @@ TRACE_EVENT(cxl_generic_event,
> 	{ CXL_GMER_TRANS_HOST_SCAN_MEDIA,		"Host Scan
>Media" },		\
> 	{ CXL_GMER_TRANS_HOST_INJECT_POISON,		"Host Inject
>Poison" },		\
> 	{ CXL_GMER_TRANS_INTERNAL_MEDIA_SCRUB,		"Internal
>Media Scrub" },	\
>-	{ CXL_GMER_TRANS_INTERNAL_MEDIA_MANAGEMENT,
>	"Internal Media Management" }	\
>+	{ CXL_GMER_TRANS_INTERNAL_MEDIA_MANAGEMENT,
>	"Internal Media Management" },	\
>+	{ CXL_GMER_TRANS_INTERNAL_MEDIA_ECS,		"Internal
>Media Error Check Scrub" },	\
>+	{ CXL_GMER_TRANS_MEDIA_INITIALIZATION,		"Media
>Initialization" }	\
> )
>
> #define CXL_GMER_VALID_CHANNEL				BIT(0)
> #define CXL_GMER_VALID_RANK				BIT(1)
> #define CXL_GMER_VALID_DEVICE				BIT(2)
> #define CXL_GMER_VALID_COMPONENT			BIT(3)
>+#define CXL_GMER_VALID_COMPONENT_ID_FORMAT		BIT(4)
> #define show_valid_flags(flags)	__print_flags(flags, "|",		   \
> 	{ CXL_GMER_VALID_CHANNEL,			"CHANNEL"	}, \
> 	{ CXL_GMER_VALID_RANK,				"RANK"
>	}, \
> 	{ CXL_GMER_VALID_DEVICE,			"DEVICE"	}, \
>-	{ CXL_GMER_VALID_COMPONENT,			"COMPONENT"
>	}  \
>+	{ CXL_GMER_VALID_COMPONENT,			"COMPONENT"
>	}, \
>+	{ CXL_GMER_VALID_COMPONENT_ID_FORMAT,
>	"COMPONENT PLDM FORMAT"	} \
> )
>
> TRACE_EVENT(cxl_general_media,
>@@ -348,6 +362,7 @@ TRACE_EVENT(cxl_general_media,
> 		__field(u64, dpa)
> 		__field(u8, descriptor)
> 		__field(u8, type)
>+		__field(u8, sub_type)
> 		__field(u8, transaction_type)
> 		__field(u8, channel)
> 		__field(u32, device)
>@@ -359,6 +374,8 @@ TRACE_EVENT(cxl_general_media,
> 		__field(u8, rank)
> 		__field(u8, dpa_flags)
> 		__string(region_name, cxlr ? dev_name(&cxlr->dev) : "")
>+		__field(u8, cme_threshold_ev_flags)
>+		__field(u32, cme_count)
> 	),
>
> 	TP_fast_assign(
>@@ -372,6 +389,7 @@ TRACE_EVENT(cxl_general_media,
> 		__entry->dpa &= CXL_DPA_MASK;
> 		__entry->descriptor = rec->media_hdr.descriptor;
> 		__entry->type = rec->media_hdr.type;
>+		__entry->sub_type = rec->sub_type;
> 		__entry->transaction_type = rec->media_hdr.transaction_type;
> 		__entry->channel = rec->media_hdr.channel;
> 		__entry->rank = rec->media_hdr.rank;
>@@ -380,6 +398,8 @@ TRACE_EVENT(cxl_general_media,
> 			CXL_EVENT_GEN_MED_COMP_ID_SIZE);
> 		__entry->validity_flags = get_unaligned_le16(&rec-
>>media_hdr.validity_flags);
> 		__entry->hpa = hpa;
>+		__entry->cme_threshold_ev_flags = rec-
>>cme_threshold_ev_flags;
>+		__entry->cme_count = get_unaligned_le24(rec->cme_count);
> 		if (cxlr) {
> 			__assign_str(region_name);
> 			uuid_copy(&__entry->region_uuid, &cxlr-
>>params.uuid); @@ -389,18 +409,26 @@ TRACE_EVENT(cxl_general_media,
> 		}
> 	),
>
>-	CXL_EVT_TP_printk("dpa=%llx dpa_flags='%s' " \
>-		"descriptor='%s' type='%s' transaction_type='%s' channel=%u
>rank=%u " \
>-		"device=%x comp_id=%s validity_flags='%s' " \
>-		"hpa=%llx region=%s region_uuid=%pUb",
>-		__entry->dpa, show_dpa_flags(__entry->dpa_flags),
>+	CXL_EVT_TP_printk("dpa=0x%llx dpa_flags=0x%x " \
>+		"descriptor='%s' type='%s' sub_type=0x%x " \
>+		"transaction_type=0x%x channel=%u rank=%u " \
>+		"device=0x%x validity_flags=0x%x " \
>+		"comp_id=%s pldm_entity_id=%s pldm_resource_id=%s " \
>+		"hpa=0x%llx region=%s region_uuid=%pUb " \
>+		"cme_threshold_ev_flags=0x%x cme_count=0x%x ",
>+		__entry->dpa, __entry->dpa_flags,
> 		show_event_desc_flags(__entry->descriptor),
> 		show_gmer_mem_event_type(__entry->type),
>-		show_trans_type(__entry->transaction_type),
>+		__entry->sub_type, __entry->transaction_type,
> 		__entry->channel, __entry->rank, __entry->device,
>+		__entry->validity_flags,
> 		__print_hex(__entry->comp_id,
>CXL_EVENT_GEN_MED_COMP_ID_SIZE),
>-		show_valid_flags(__entry->validity_flags),
>-		__entry->hpa, __get_str(region_name), &__entry->region_uuid
>+		show_pldm_entity_id(__entry->validity_flags,
>CXL_GMER_VALID_COMPONENT,
>+
>CXL_GMER_VALID_COMPONENT_ID_FORMAT, __entry->comp_id),
>+		show_pldm_resource_id(__entry->validity_flags,
>CXL_GMER_VALID_COMPONENT,
>+
>CXL_GMER_VALID_COMPONENT_ID_FORMAT, __entry->comp_id),
>+		__entry->hpa, __get_str(region_name), &__entry->region_uuid,
>+		__entry->cme_threshold_ev_flags, __entry->cme_count
> 	)
> );
>
>diff --git a/include/cxl/event.h b/include/cxl/event.h index
>e1d485ad376b..2b07adf39010 100644
>--- a/include/cxl/event.h
>+++ b/include/cxl/event.h
>@@ -45,14 +45,17 @@ struct cxl_event_generic {
>
> /*
>  * General Media Event Record
>- * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
>+ * CXL rev 3.1 Section 8.2.9.2.1.1; Table 8-45
>  */
> #define CXL_EVENT_GEN_MED_COMP_ID_SIZE	0x10
> struct cxl_event_gen_media {
> 	struct cxl_event_media_hdr media_hdr;
> 	u8 device[3];
> 	u8 component_id[CXL_EVENT_GEN_MED_COMP_ID_SIZE];
>-	u8 reserved[46];
>+	u8 cme_threshold_ev_flags;
>+	u8 cme_count[3];
>+	u8 sub_type;
>+	u8 reserved[41];
> } __packed;
>
> /*
>--
>2.43.0


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ