lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a036291f-2483-4bf0-9ac1-db310f8115aa@linux.alibaba.com>
Date: Thu, 22 May 2025 17:41:31 +0800
From: Shuai Xue <xueshuai@...ux.alibaba.com>
To: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>
Cc: rostedt@...dmis.org, Lukas Wunner <lukas@...ner.de>,
 linux-pci@...r.kernel.org, LKML <linux-kernel@...r.kernel.org>,
 linux-edac@...r.kernel.org, linux-trace-kernel@...r.kernel.org,
 helgaas@...nel.org, bhelgaas@...gle.com, tony.luck@...el.com, bp@...en8.de,
 mhiramat@...nel.org, mathieu.desnoyers@...icios.com, oleg@...hat.com,
 naveen@...nel.org, davem@...emloft.net, anil.s.keshavamurthy@...el.com,
 mark.rutland@....com, peterz@...radead.org, tianruidong@...ux.alibaba.com
Subject: Re: [PATCH v8] PCI: hotplug: Add a generic RAS tracepoint for hotplug
 event



在 2025/5/20 18:07, Ilpo Järvinen 写道:
> On Tue, 20 May 2025, Shuai Xue wrote:
> 
>> Hi, Ilpo,
>>
>> 在 2025/5/20 01:10, Ilpo Järvinen 写道:
>>> On Mon, 12 May 2025, Shuai Xue wrote:
>>>
>>>> Hotplug events are critical indicators for analyzing hardware health,
>>>> particularly in AI supercomputers where surprise link downs can
>>>> significantly impact system performance and reliability.
>>>>
>>>> To this end, define a new TRACING_SYSTEM named pci, add a generic RAS
>>>> tracepoint for hotplug event to help healthy check, and generate
>>>> tracepoints for pcie hotplug event. Add enum pci_hotplug_event in
>>>> include/uapi/linux/pci.h so applications like rasdaemon can register
>>>> tracepoint event handlers for it.
>>>>
>>>> The output like below:
>>>>
>>>> $ echo 1 > /sys/kernel/debug/tracing/events/pci/pci_hp_event/enable
>>>> $ cat /sys/kernel/debug/tracing/trace_pipe
>>>>       <...>-206     [001] .....    40.373870: pci_hp_event: 0000:00:02.0
>>>> slot:10, event:Link Down
>>>>
>>>>       <...>-206     [001] .....    40.374871: pci_hp_event: 0000:00:02.0
>>>> slot:10, event:Card not present
>>>>
>>>> Suggested-by: Lukas Wunner <lukas@...ner.de>
>>>> Suggested-by: Steven Rostedt <rostedt@...dmis.org>
>>>> Signed-off-by: Shuai Xue <xueshuai@...ux.alibaba.com>
>>>> Reviewed-by: Lukas Wunner <lukas@...ner.de>
>>>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@...wei.com>
>>>> ---
>>>> changes since v7:
>>>> - replace the TRACE_INCLUDE_PATH to avoid macro conflict per Steven
>>>> - pick up Reviewed-by from Lukas Wunner
>>>> ---
>>>>    drivers/pci/hotplug/Makefile      |  3 ++
>>>>    drivers/pci/hotplug/pciehp_ctrl.c | 33 ++++++++++++---
>>>>    drivers/pci/hotplug/trace.h       | 68 +++++++++++++++++++++++++++++++
>>>>    include/uapi/linux/pci.h          |  7 ++++
>>>>    4 files changed, 105 insertions(+), 6 deletions(-)
>>>>    create mode 100644 drivers/pci/hotplug/trace.h
>>>>
>>>> diff --git a/drivers/pci/hotplug/Makefile b/drivers/pci/hotplug/Makefile
>>>> index 40aaf31fe338..a1a9d1e98962 100644
>>>> --- a/drivers/pci/hotplug/Makefile
>>>> +++ b/drivers/pci/hotplug/Makefile
>>>> @@ -3,6 +3,9 @@
>>>>    # Makefile for the Linux kernel pci hotplug controller drivers.
>>>>    #
>>>>    +# define_trace.h needs to know how to find our header
>>>> +CFLAGS_pciehp_ctrl.o				:= -I$(src)
>>>> +
>>>>    obj-$(CONFIG_HOTPLUG_PCI)		+= pci_hotplug.o
>>>>    obj-$(CONFIG_HOTPLUG_PCI_COMPAQ)	+= cpqphp.o
>>>>    obj-$(CONFIG_HOTPLUG_PCI_IBM)		+= ibmphp.o
>>>> diff --git a/drivers/pci/hotplug/pciehp_ctrl.c
>>>> b/drivers/pci/hotplug/pciehp_ctrl.c
>>>> index d603a7aa7483..f9beb4d3a9b8 100644
>>>> --- a/drivers/pci/hotplug/pciehp_ctrl.c
>>>> +++ b/drivers/pci/hotplug/pciehp_ctrl.c
>>>> @@ -23,6 +23,9 @@
>>>>    #include "../pci.h"
>>>>    #include "pciehp.h"
>>>>    +#define CREATE_TRACE_POINTS
>>>> +#include "trace.h"
>>>> +
>>>>    /* The following routines constitute the bulk of the
>>>>       hotplug controller logic
>>>>     */
>>>> @@ -244,12 +247,20 @@ void pciehp_handle_presence_or_link_change(struct
>>>> controller *ctrl, u32 events)
>>>>    	case ON_STATE:
>>>>    		ctrl->state = POWEROFF_STATE;
>>>>    		mutex_unlock(&ctrl->state_lock);
>>>> -		if (events & PCI_EXP_SLTSTA_DLLSC)
>>>> +		if (events & PCI_EXP_SLTSTA_DLLSC) {
>>>>    			ctrl_info(ctrl, "Slot(%s): Link Down\n",
>>>>    				  slot_name(ctrl));
>>>> -		if (events & PCI_EXP_SLTSTA_PDC)
>>>> +			trace_pci_hp_event(pci_name(ctrl->pcie->port),
>>>> +					   slot_name(ctrl),
>>>> +					   PCI_HOTPLUG_LINK_DOWN);
>>>> +		}
>>>> +		if (events & PCI_EXP_SLTSTA_PDC) {
>>>>    			ctrl_info(ctrl, "Slot(%s): Card not present\n",
>>>>    				  slot_name(ctrl));
>>>> +			trace_pci_hp_event(pci_name(ctrl->pcie->port),
>>>> +					   slot_name(ctrl),
>>>> +					   PCI_HOTPLUG_CARD_NOT_PRESENT);
>>>> +		}
>>>>    		pciehp_disable_slot(ctrl, SURPRISE_REMOVAL);
>>>>    		break;
>>>>    	default:
>>>> @@ -269,6 +280,9 @@ void pciehp_handle_presence_or_link_change(struct
>>>> controller *ctrl, u32 events)
>>>>    					      INDICATOR_NOOP);
>>>>    			ctrl_info(ctrl, "Slot(%s): Card not present\n",
>>>>    				  slot_name(ctrl));
>>>> +			trace_pci_hp_event(pci_name(ctrl->pcie->port),
>>>> +					   slot_name(ctrl),
>>>> +					   PCI_HOTPLUG_CARD_NOT_PRESENT);
>>>>    		}
>>>>    		mutex_unlock(&ctrl->state_lock);
>>>>    		return;
>>>> @@ -281,12 +295,19 @@ void pciehp_handle_presence_or_link_change(struct
>>>> controller *ctrl, u32 events)
>>>>    	case OFF_STATE:
>>>>    		ctrl->state = POWERON_STATE;
>>>>    		mutex_unlock(&ctrl->state_lock);
>>>> -		if (present)
>>>> +		if (present) {
>>>>    			ctrl_info(ctrl, "Slot(%s): Card present\n",
>>>>    				  slot_name(ctrl));
>>>> -		if (link_active)
>>>> -			ctrl_info(ctrl, "Slot(%s): Link Up\n",
>>>> -				  slot_name(ctrl));
>>>> +			trace_pci_hp_event(pci_name(ctrl->pcie->port),
>>>> +					   slot_name(ctrl),
>>>> +					   PCI_HOTPLUG_CARD_PRESENT);
>>>> +		}
>>>> +		if (link_active) {
>>>> +			ctrl_info(ctrl, "Slot(%s): Link Up\n",
>>>> slot_name(ctrl));
>>>> +			trace_pci_hp_event(pci_name(ctrl->pcie->port),
>>>> +					   slot_name(ctrl),
>>>> +					   PCI_HOTPLUG_LINK_UP);
>>>> +		}
>>>>    		ctrl->request_result = pciehp_enable_slot(ctrl);
>>>>    		break;
>>>>    	default:
>>>> diff --git a/drivers/pci/hotplug/trace.h b/drivers/pci/hotplug/trace.h
>>>> new file mode 100644
>>>> index 000000000000..21329c198019
>>>> --- /dev/null
>>>> +++ b/drivers/pci/hotplug/trace.h
>>>> @@ -0,0 +1,68 @@
>>>> +/* SPDX-License-Identifier: GPL-2.0 */
>>>> +#if !defined(_TRACE_HW_EVENT_PCI_HP_H) ||
>>>> defined(TRACE_HEADER_MULTI_READ)
>>>> +#define _TRACE_HW_EVENT_PCI_HP_H
>>>> +
>>>> +#include <linux/tracepoint.h>
>>>> +
>>>> +#undef TRACE_SYSTEM
>>>> +#define TRACE_SYSTEM pci
>>>> +
>>>> +#define PCI_HOTPLUG_EVENT					\
>>>> +	EM(PCI_HOTPLUG_LINK_UP,			"Link Up")	\
>>>> +	EM(PCI_HOTPLUG_LINK_DOWN,		"Link Down")	\
>>>> +	EM(PCI_HOTPLUG_CARD_PRESENT,		"Card present")	\
>>>> +	EMe(PCI_HOTPLUG_CARD_NOT_PRESENT,	"Card not present")
>>>
>>> Hi,
>>>
>>> While I was thinking of adding tracing into PCIe BW controller (bwctrl),
>>> I ended up thinking that perhaps it would make more sense to have PCIe
>>> Link related tracepoints which would cover both hotplug and bwctrl so that
>>> also Link Speed changes would be reported through the same trace event.
>>>
>>> Downgraded speed may indicate there's something wrong with the card and
>>> the Link Speed does have performance impact too for those who are pushing
>>> BW boundaries such as AI systems.
>>
>> Agreed!
>>
>>>
>>> So my suggestion is:
>>>
>>> - Add "Link Speed changed" to the event types.
>>> - Add Link Speed and Width into the event format (and probably also Flit
>>>     mode as PCIe gen6 is coming).
>>
>>
>> How about bellow event format:
>>
>> +	TP_STRUCT__entry(
>> +		__string(	port_name,	port_name	)
>> +		__field(	unsigned char,	cur_bus_speed	)
>> +		__field(	unsigned char,	max_bus_speed	)
> 
> Add also the Link Width.

Got it.
> 
>> +		__field(	unsigned char,	flit_mode	)
>> +	),
>>
>> And add the event to pcie_update_link_speed():
>>
>> @@ -796,6 +799,10 @@ void pcie_update_link_speed(struct pci_bus *bus)
>>          pcie_capability_read_word(bridge, PCI_EXP_LNKSTA, &linksta);
>>          pcie_capability_read_word(bridge, PCI_EXP_LNKSTA2, &linksta2);
>>          __pcie_update_link_speed(bus, linksta, linksta2);
>> +
>> +       trace_pci_link_event(pci_name(bridge), bus->cur_bus_speed,
>> +                                          bus->max_bus_speed,
> 
> I don't think outputting the internal values of enum pci_bus_speed is a
> good idea. Maybe these could be printed as a string (with
> pci_speed_string()) or encoded with trace interface specific values.

I see, a human readable string is better.
> 
> Perhaps it would make sense to check if the speed really changed before
> sending that event, but there are good sides in both approaches as I
> know some platforms assert LBMS more than once during a single Link Speed
> change.
> 
>> +                                          PCI_HOTPLUG_LINK_SPEED_CHANGED);
>>
>> But I don't find link speed changed in hotplug driver
> 
> pciehp_check_link_status() calls __pcie_update_link_speed().

Thanks.

> 
>> , and the format of "Link Speed changed" is a bit different from
>> "pci_hp_event".
> 
> The difference is only because when the Link is down, there's no Link
> Speed (obviously). Whenever a new device is hotplugged and it comes up,
> there's also Link Speed for it which can be included into the trace event.
> 
> I think the trace event should have some special value for the fields that
> are N/A due to Link being off. While it would be possible to create
> separate events for speed changes and hotplug, I don't see any pros in
> that approach over just having the N/A fields marked as such when the Link
> is Down.
> 
> Perhaps it would even make sense to add PCIE_SPEED_LINK_DOWN into
> bus->cur_bus_speed when hotplug finds the card is gone (I'm not entirely
> sure how bwctrl or pcie_cooling driver would cope with that though, they
> might need minor tweaking to support it, and there are a few other drivers
> that use that field).
> 
>> Do we really need a PCI_HOTPLUG_EVENT? May PCI_LINK_EVENT is more
>> appropriate?
> 
> Ah, right, I forgot to mention it would make sense to rename it to
> PCI_LINK_EVENT.
> 

Got it.

Thanks.
Shuai

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ