[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87cysjgyun.fsf@waldekranz.com>
Date: Mon, 26 Feb 2024 14:05:20 +0100
From: Tobias Waldekranz <tobias@...dekranz.com>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: davem@...emloft.net, kuba@...nel.org, roopa@...dia.com,
razor@...ckwall.org, bridge@...ts.linux.dev, netdev@...r.kernel.org,
jiri@...nulli.us, ivecera@...hat.com, mhiramat@...nel.org,
linux-trace-kernel@...r.kernel.org
Subject: Re: [PATCH v3 net-next 4/4] net: switchdev: Add tracepoints
On fre, feb 23, 2024 at 10:38, Steven Rostedt <rostedt@...dmis.org> wrote:
> On Fri, 23 Feb 2024 12:44:53 +0100
> Tobias Waldekranz <tobias@...dekranz.com> wrote:
>
>> Add a basic set of tracepoints:
>>
>> - switchdev_defer: Fires whenever an operation is enqueued to the
>> switchdev workqueue for deferred delivery.
>>
>> - switchdev_call_{atomic,blocking}: Fires whenever a notification is
>> sent to the corresponding switchdev notifier chain.
>>
>> - switchdev_call_replay: Fires whenever a notification is sent to a
>> specific driver's notifier block, in response to a replay request.
>>
>> Signed-off-by: Tobias Waldekranz <tobias@...dekranz.com>
>> ---
>> include/trace/events/switchdev.h | 74 ++++++++++++++++++++++++++++++++
>> net/switchdev/switchdev.c | 71 +++++++++++++++++++++++++-----
>> 2 files changed, 135 insertions(+), 10 deletions(-)
>> create mode 100644 include/trace/events/switchdev.h
>>
>> diff --git a/include/trace/events/switchdev.h b/include/trace/events/switchdev.h
>> new file mode 100644
>> index 000000000000..dcaf6870d017
>> --- /dev/null
>> +++ b/include/trace/events/switchdev.h
>> @@ -0,0 +1,74 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +#undef TRACE_SYSTEM
>> +#define TRACE_SYSTEM switchdev
>> +
>> +#if !defined(_TRACE_SWITCHDEV_H) || defined(TRACE_HEADER_MULTI_READ)
>> +#define _TRACE_SWITCHDEV_H
>> +
>> +#include <linux/tracepoint.h>
>> +#include <net/switchdev.h>
>> +
>> +#define SWITCHDEV_TRACE_MSG_MAX 128
>
> 128 bytes is awfully big to waste on the ring buffer. What's the average
> size of a string?
I would say the typical message is around 60-80 bytes. Some common examples:
PORT_OBJ_ADD PORT_VLAN(flags 0x0 orig br0) vid 1 flags 0x27
PORT_OBJ_DEL HOST_MDB(flags 0x0 orig br0) vid 100 addr 33:33:ff:ff:00:09
The worst case I can think of currently is 95 characters:
VXLAN_FDB_ADD_TO_DEVICE vid 1000 addr de:ad:be:ef:00:01 added_by_user is_local locked offloaded
>> +
>> +DECLARE_EVENT_CLASS(switchdev_call,
>> + TP_PROTO(unsigned long val,
>> + const struct switchdev_notifier_info *info,
>> + int err),
>> +
>> + TP_ARGS(val, info, err),
>> +
>> + TP_STRUCT__entry(
>> + __field(unsigned long, val)
>> + __string(dev, info->dev ? netdev_name(info->dev) : "(null)")
>> + __field(const struct switchdev_notifier_info *, info)
>> + __field(int, err)
>> + __array(char, msg, SWITCHDEV_TRACE_MSG_MAX)
>> + ),
>> +
>> + TP_fast_assign(
>> + __entry->val = val;
>> + __assign_str(dev, info->dev ? netdev_name(info->dev) : "(null)");
>> + __entry->info = info;
>> + __entry->err = err;
>> + switchdev_notifier_str(val, info, __entry->msg, SWITCHDEV_TRACE_MSG_MAX);
>
> Is it possible to just store the information in the trace event and then
> call the above function in the read stage? There's helpers to pass strings
> around (namely a struct trace_seq *p).
I'm a complete novice when it comes to tracepoint internals. Am I right
in assuming that TP_printk may execute at a much later time than
TP_fast_assign? E.g., the object referenced by __entry->info may very
well have been freed by that time? That at least seems to be what my
naive refactor to replace __entry->msg with __get_buf() suggests :)
If so, the layout of the switchdev_notifier_* structs makes it a bit
cumbersome to clone the notification in the assign phase, as the size of
a specific notification (e.g., switchdev_notifier_port_obj_info) is not
known by the common notification (switchdev_notifier_info). In the case
of switchdev objects, the problem repeats a second time. E.g., the size
of switchdev_obj_port_vlan is not known by switchdev_obj.
> It would require a plugin for libtraceevent if you want to expose it to
> user space tools for trace-cmd and perf though.
>
> Another possibility is if this event will not race with other events on he
> same CPU, you could create a per-cpu buffer, write into that, and then use
> __string() and __assign_str() to save it, as traces happen with preemption
> disabled.
But bottom halves are still enabled I suppose? Notifications can be
generated both from process context (e.g., users configuring the bridge
with iproute2), and from bridge packet processing (e.g., adding new
neighbors to the FDB). So I don't think that would work in this case.
> -- Steve
Thanks for taking the time!
>> + ),
>> +
>> + TP_printk("dev %s %s -> %d", __get_str(dev), __entry->msg, __entry->err)
>> +);
>> +
>> +DEFINE_EVENT(switchdev_call, switchdev_defer,
>> + TP_PROTO(unsigned long val,
>> + const struct switchdev_notifier_info *info,
>> + int err),
>> +
>> + TP_ARGS(val, info, err)
>> +);
>> +
>> +DEFINE_EVENT(switchdev_call, switchdev_call_atomic,
>> + TP_PROTO(unsigned long val,
>> + const struct switchdev_notifier_info *info,
>> + int err),
>> +
>> + TP_ARGS(val, info, err)
>> +);
>> +
>> +DEFINE_EVENT(switchdev_call, switchdev_call_blocking,
>> + TP_PROTO(unsigned long val,
>> + const struct switchdev_notifier_info *info,
>> + int err),
>> +
>> + TP_ARGS(val, info, err)
>> +);
>> +
>> +DEFINE_EVENT(switchdev_call, switchdev_call_replay,
>> + TP_PROTO(unsigned long val,
>> + const struct switchdev_notifier_info *info,
>> + int err),
>> +
>> + TP_ARGS(val, info, err)
>> +);
>> +
>> +#endif /* _TRACE_SWITCHDEV_H */
>> +
Powered by blists - more mailing lists