[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fcfc51c0-6a1f-435b-844b-4daba132f7b6@linux.alibaba.com>
Date: Tue, 22 Jul 2025 10:43:29 +0800
From: Shuai Xue <xueshuai@...ux.alibaba.com>
To: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>
Cc: Matthew W Carlis <mattc@...estorage.com>, helgaas@...nel.org,
Lukas Wunner <lukas@...ner.de>, anil.s.keshavamurthy@...el.com,
bhelgaas@...gle.com, bp@...en8.de, davem@...emloft.net,
linux-edac@...r.kernel.org, LKML <linux-kernel@...r.kernel.org>,
linux-pci@...r.kernel.org, linux-trace-kernel@...r.kernel.org,
mark.rutland@....com, mathieu.desnoyers@...icios.com, mhiramat@...nel.org,
naveen@...nel.org, oleg@...hat.com, peterz@...radead.org,
rostedt@...dmis.org, tianruidong@...ux.alibaba.com, tony.luck@...el.com
Subject: Re: [PATCH v8] PCI: hotplug: Add a generic RAS tracepoinggt for
hotplug event
在 2025/7/21 18:18, Ilpo Järvinen 写道:
> On Fri, 18 Jul 2025, Shuai Xue wrote:
>> 在 2025/7/18 11:46, Matthew W Carlis 写道:
>>> On Thu, Jul 17, 2025 Bjorn Helgaas wrote
>>>> So I think your idea of adding current link speed/width to the "Link
>>>> Up" event is still on the table, and that does sound useful to me.
>>>
>>> We're already reading the link status register here to check DLLA so
>>> it would be nice. I guess if everything is healthy we're probably already
>>> at the maximum speed by this point.
>>>
>>>> In the future we might add another tracepoint when we enumerate the
>>>> device and know the Vendor/Device ID.
>>>
>>> I think we might have someone who would be interested in doing it.
>>
>>
>> Hi, all,
>>
>> IIUC, the current hotplug event (or presence event) is enough for Matthew.
>> and we would like a new tracepoing for link speed change which reports
>> speeds.
>>
>> For hotplug event, I plan to send a new version to
>>
>> 1. address Bjorn' concerns about event strings by removing its spaces.
>>
>> #define PCI_HOTPLUG_EVENT
>> \
>> EM(PCI_HOTPLUG_LINK_UP, "PCI_HOTPLUG_LINK_UP")
>> \
>> EM(PCI_HOTPLUG_LINK_DOWN, "PCI_HOTPLUG_LINK_DOWN")
>> \
>> EM(PCI_HOTPLUG_CARD_PRESENT, "PCI_HOTPLUG_CARD_PRESENT")
>> \
>> EMe(PCI_HOTPLUG_CARD_NOT_PRESENT,
>> "PCI_HOTPLUG_CARD_NOT_PRESENT")
>>
>> 2. address Ilpo comments by moving pci_hp_event to a common place
>> (include/trace/events/pci.h) so that the new comming can also use it.
>
> Ah, I only now noticed you've decided to re-place them. Please disregard
> my other comment about this being still open/undecided item.
>
>> For link speed change event (perhaps named as pci_link_event),
>> I plan to send a seperate patch, which provides:
>>
>> TP_STRUCT__entry(
>> __string( port_name, port_name )
>> __field( unsigned char, cur_bus_speed )
>> __field( unsigned char, max_bus_speed )
>> __field( unsigned char, width )
>> __field( unsigned int, flit_mode )
>> __field( unsigned char, reason )
>> ),
>>
>> The reason field is from Lukas ideas which indicates why the link speed
>> changed, e.g. "hotplug", "autonomous", "thermal", "retrain", etc.
>>
>> Are you happy with above changes?
>
> Since you're probably quite far with the pcie link event patch too given
> above, could you take a look at the LNKSTA flags representation in my
> patch and incorporate those as well as there seems to always lot of
> uncertainty about those flags when investigating the LBMS/bwctrl related
> issues so it seems prudent to explicitly include them into the traceevent
> output:
>
> https://lore.kernel.org/linux-pci/7c289bba-3133-0989-6333-41fc41fe3504@linux.intel.com/
>
>
Sure, Thank you for the feedback.
I like the LNKSTA flags, LNKSTA flags provides better genericity
compared to the custom reason field I initially proposed. But it may
cause confusion when used in pcie_retrain_link(). However, I've
identified a potential issue when this approach is applied in
pcie_retrain_link() scenarios.
Consider the following trace output when a device hotpluged:
$ cat /sys/kernel/debug/tracing/trace_pipe
$ cat /sys/kernel/debug/tracing/trace_pipe
<...>-118 [002] ..... 28.414220: pci_hp_event: 0000:00:03.0 slot:30, event:PCI_HOTPLUG_CARD_PRESENT
<...>-118 [002] ..... 28.414273: pci_hp_event: 0000:00:03.0 slot:30, event:PCI_HOTPLUG_LINK_UP
irq/57-pciehp-118 [002] ..... 28.540189: pcie_link_event: 0000:00:03.0 type:4, cur_bus_speed:2.5 GT/s PCIe, max_bus_speed:16.0 GT/s PCIe, width:1, flit_mode:0, status:DLLLA
irq/57-pciehp-118 [002] ..... 28.544999: pcie_link_event: 0000:00:03.0 type:4, cur_bus_speed:2.5 GT/s PCIe, max_bus_speed:16.0 GT/s PCIe, width:1, flit_mode:0, status:DLLLA
The problem is that both trace events show status:DLLLA (Data Link Layer
Link Active), which is the direct reading from PCI_EXP_LNKSTA. However,
this doesn't accurately reflect the underlying context:
- First DLLLA: Triggered by board_added() - link establishment after
card insertion
- Second DLLLA: Triggered by pcie_retrain_link() - link retraining
completion
( I trace the events in pcie_update_link_speed() )
In the second case, the more relevant status would be PCI_EXP_LNKSTA_LT
(Link Training) to indicate that link retraining was performed, even
though the final register state shows DLLLA.
Question: Should we explicitly report the contextual status (e.g.,
PCI_EXP_LNKSTA_LT for retraining scenarios) rather than always reading
the current register field? This would provide more meaningful trace
information for debugging link state transitions.
Additionally, I'd appreciate your thoughts on the overall tracepoint
format shown above. Does this structure provide sufficient information
for hotplug and link analysis while maintaining readability?
Thanks.
Shuai
Powered by blists - more mailing lists