[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <faccb715-8d9f-4761-855a-0fb8be2ebad4@linux.alibaba.com>
Date: Sun, 10 Nov 2024 18:12:09 +0800
From: Shuai Xue <xueshuai@...ux.alibaba.com>
To: Lukas Wunner <lukas@...ner.de>
Cc: linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-edac@...r.kernel.org, bhelgaas@...gle.com, tony.luck@...el.com,
bp@...en8.de
Subject: Re: [RFC PATCH] PCI: pciehp: Generate a RAS tracepoint for hotplug
event
在 2024/11/10 01:52, Lukas Wunner 写道:
> On Fri, Nov 08, 2024 at 11:09:39AM +0800, Shuai Xue wrote:
>> --- a/drivers/pci/hotplug/pciehp_ctrl.c
>> +++ b/drivers/pci/hotplug/pciehp_ctrl.c
>> @@ -19,6 +19,7 @@
>> #include <linux/types.h>
>> #include <linux/pm_runtime.h>
>> #include <linux/pci.h>
>> +#include <ras/ras_event.h>
>> #include "pciehp.h"
>
> Hm, why does the TRACE_EVENT() definition have to live in ras_event.h?
> Why not, say, in pciehp.h?
IMHO, it is a type of RAS related event, so I add it in ras_event.h, similar to
other events like aer_event and memory_failure_event.
I could move it to pciehp.h, if the maintainers prefer that location.
>
>> @@ -245,6 +246,8 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
>> if (events & PCI_EXP_SLTSTA_PDC)
>> ctrl_info(ctrl, "Slot(%s): Card not present\n",
>> slot_name(ctrl));
>> + trace_pciehp_event(dev_name(&ctrl->pcie->port->dev),
>> + slot_name(ctrl), ON_STATE, events);
>> pciehp_disable_slot(ctrl, SURPRISE_REMOVAL);
>> break;
>> default:
>
> I'd suggest using pci_name() instead of dev_name() as it's a little shorter.
Will use pci_name() instead.
>
> Passing ON_STATE here isn't always accurate because there's
> "case BLINKINGOFF_STATE" with a fallthrough preceding the
> above code block.
Yes, you are right, I missed the above fallthrough case.
>
> Wouldn't it be more readable to just log the event that occured
> as a string, e.g. "Surprise Removal" (and "Insertion" or "Hot Add"
> for the other trace event you're introducing) instead of the state?
>
> Otherwise you see "ON_STATE" in the log but that's actually the
> *old* value so you have to mentally convert this to "previously ON,
> so now must be transitioning to OFF".
I see your point. "Surprise Removal" or "Insertion" is indeed the exact state
transition. However, I am concerned that using a string might make it difficult
for user space tools like rasdaemon to parse.
How about adding a new enum for state transition? For example:
enum pciehp_trans_type {
PCIEHP_SAFE_REMOVAL,
PCIEHP_SURPRISE_REMOVAL,
PCIEHP_Hot_Add,
...
}
And define the state transition as a int type for tracepoint, then rasdaemon
can parse the value easily.
trace_pciehp_event(pci_name(&ctrl->pcie->port->dev),
slot_name(ctrl), PCIEHP_SAFE_REMOVAL, events);
And TP_printk with symbolic name of the state transition.
TRACE_EVENT(pciehp_event,
TP_PROTO(const char *port_name,
const char *slot,
const int trans_state),
TP_ARGS(port_name, slot, trans_state),
TP_STRUCT__entry(
__string( port_name, port_name )
__string( slot, slot )
__field( int, trans_state )
),
TP_fast_assign(
__assign_str(port_name, port_name);
__assign_str(slot, slot);
__entry->trans_state = trans_state;
),
TP_printk("%s slot:%s, state:%d, events:%d\n",
__get_str(port_name),
__get_str(slot),
__print_symbolic(__entry->trans_state, PCIEHP_SURPRISE_REMOVAL),
);
>
> I'm fine with adding trace points to pciehp, I just want to make sure
> we do it in a way that's easy to parse for admins.
Thank you for the positive feedback :)
>
> Thanks,
>
> Lukas
Best Regards,
Shuai
Powered by blists - more mailing lists