lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fc0ded97-8643-4faa-a606-732bcd4ce4a1@linux.alibaba.com>
Date: Sat, 19 Jul 2025 13:23:28 +0800
From: Shuai Xue <xueshuai@...ux.alibaba.com>
To: Bjorn Helgaas <helgaas@...nel.org>
Cc: Matthew W Carlis <mattc@...estorage.com>, lukas@...ner.de,
 Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>,
 anil.s.keshavamurthy@...el.com, bhelgaas@...gle.com, bp@...en8.de,
 davem@...emloft.net, linux-edac@...r.kernel.org,
 linux-kernel@...r.kernel.org, linux-pci@...r.kernel.org,
 linux-trace-kernel@...r.kernel.org, mark.rutland@....com,
 mathieu.desnoyers@...icios.com, mhiramat@...nel.org, naveen@...nel.org,
 oleg@...hat.com, peterz@...radead.org, rostedt@...dmis.org,
 tianruidong@...ux.alibaba.com, tony.luck@...el.com
Subject: Re: [PATCH v8] PCI: hotplug: Add a generic RAS tracepoint for hotplug
 event



在 2025/7/19 00:35, Bjorn Helgaas 写道:
> On Fri, Jul 18, 2025 at 01:29:18PM +0800, Shuai Xue wrote:
>> 在 2025/7/18 11:46, Matthew W Carlis 写道:
>>> On Thu, Jul 17, 2025 Bjorn Helgaas wrote
>>>> So I think your idea of adding current link speed/width to the "Link
>>>> Up" event is still on the table, and that does sound useful to me.
>>>
>>> We're already reading the link status register here to check DLLA so
>>> it would be nice. I guess if everything is healthy we're probably already
>>> at the maximum speed by this point.
>>>
>>>> In the future we might add another tracepoint when we enumerate the
>>>> device and know the Vendor/Device ID.
>>>
>>> I think we might have someone who would be interested in doing it.
>>
>> IIUC, the current hotplug event (or presence event) is enough for Matthew.
>> and we would like a new tracepoing for link speed change which reports
>> speeds.
>>
>> For hotplug event, I plan to send a new version to
>>
>> 1. address Bjorn' concerns about event strings by removing its spaces.
>>
>> #define PCI_HOTPLUG_EVENT							\
>> 	EM(PCI_HOTPLUG_LINK_UP,			"PCI_HOTPLUG_LINK_UP")		\
>> 	EM(PCI_HOTPLUG_LINK_DOWN,		"PCI_HOTPLUG_LINK_DOWN")	\
>> 	EM(PCI_HOTPLUG_CARD_PRESENT,		"PCI_HOTPLUG_CARD_PRESENT")	\
>> 	EMe(PCI_HOTPLUG_CARD_NOT_PRESENT,	"PCI_HOTPLUG_CARD_NOT_PRESENT")
>>
>> 2. address Ilpo comments by moving pci_hp_event to a common place
>> (include/trace/events/pci.h) so that the new comming can also use it.
>>
>> For link speed change event (perhaps named as pci_link_event),
>> I plan to send a seperate patch, which provides:
>>
>> 	TP_STRUCT__entry(
>> 		__string(	port_name,	port_name	)
>> 		__field(	unsigned char,	cur_bus_speed	)
>> 		__field(	unsigned char,	max_bus_speed	)
>>   		__field(	unsigned char,	width		)
>>   		__field(	unsigned int,	flit_mode	)
>> 		__field(	unsigned char,	reason		)
>> 		),
>>
>> The reason field is from Lukas ideas which indicates why the link speed
>> changed, e.g. "hotplug", "autonomous", "thermal", "retrain", etc.
>>
>> Are you happy with above changes?
> 
> Seems good to me.
> 
> What do you plan for PCI_HOTPLUG_LINK_UP?  It would be nice to have
> the link info there since that's sort of a link speed change itself.
> 
> Bjorn

Hi, Bjorn,

IMMO, link information is best included in the pci_link_event tracepoint,
rather than in the hotplug event itself.

For example, with the a device hotpluged, the trace output looks like this:

$ cat /sys/kernel/debug/tracing/trace_pipe

            <...>-120     [002] .....   104.864051: pci_hp_event: 0000:00:03.0 slot:30, event:PCI_HOTPLUG_CARD_PRESENT

            <...>-120     [002] .....   104.864081: pci_hp_event: 0000:00:03.0 slot:30, event:PCI_HOTPLUG_LINK_UP

    irq/57-pciehp-120     [002] .....   104.990434: pci_link_event: 0000:00:03.0 cur_bus_speed:20, max_bus_speed:23, width:1, flit_mode:0, reason:5

    irq/57-pciehp-120     [002] .....   104.992377: pci_link_event: 0000:00:03.0 cur_bus_speed:20, max_bus_speed:23, width:1, flit_mode:0, reason:0

Here, reason 5 corresponds to PCIe hotplug, and reason 0 indicates a PCIe link retrain.

This approach keeps the separation of concerns clear: hotplug events are
tracked by pci_hp_event, while all link-related information, including
speed changes and the reason for the change, is handled by
pci_link_event.

Thanks.
Shuai

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ