[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <96bd71d6-2978-435f-99f8-c31097487cac@redhat.com>
Date: Mon, 29 Apr 2024 07:33:59 +0200
From: Adrian Moreno <amorenoz@...hat.com>
To: Ido Schimmel <idosch@...sch.org>
Cc: netdev@...r.kernel.org, aconole@...hat.com, echaudro@...hat.com,
horms@...nel.org, i.maximets@....org, Yotam Gigi <yotam.gi@...il.com>,
"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH net-next 4/8] net: psample: add tracepoint
On 4/25/24 17:25, Ido Schimmel wrote:
> On Thu, Apr 25, 2024 at 10:06:20AM +0200, Adrian Moreno wrote:
>>
>>
>> On 4/25/24 09:18, Ido Schimmel wrote:
>>> On Wed, Apr 24, 2024 at 03:50:51PM +0200, Adrian Moreno wrote:
>>>> Currently there are no widely-available tools to dump the metadata and
>>>> group information when a packet is sampled, making it difficult to
>>>> troubleshoot related issues.
>>>>
>>>> This makes psample use the event tracing framework to log the sampling
>>>> of a packet so that it's easier to quickly identify the source
>>>> (i.e: group) and context (i.e: metadata) of a packet being sampled.
>>>>
>>>> This patch creates some checkpatch splats, but the style of the
>>>> tracepoint definition mimics that of other modules so it seems
>>>> acceptable.
>>>
>>> I don't see a good reason to add this tracepoint (which we won't be able
>>> to remove) when you can easily do that with bpftrace which by now should
>>> be widely available:
>>>
>>> #!/usr/bin/bpftrace
>>>
>>> kfunc:psample_sample_packet
>>> {
>>> $ts_us = nsecs() / 1000;
>>> $secs = $ts_us / 1000000;
>>> $us = $ts_us % 1000000;
>>> $group = args.group;
>>> $skb = args.skb;
>>> $md = args.md;
>>>
>>> printf("%-16s %-6d %6llu.%6llu group_num = %u refcount=%u seq=%u skbaddr=%p len=%u data_len=%u sample_rate=%u in_ifindex=%d out_ifindex=%d user_cookie=%rx\n",
>>> comm, pid, $secs, $us, $group->group_num, $group->refcount, $group->seq,
>>> $skb, $skb->len, $skb->data_len, args.sample_rate,
>>> $md->in_ifindex, $md->out_ifindex,
>>> buf($md->user_cookie, $md->user_cookie_len));
>>> }
>>>
>>> Example output:
>>>
>>> mausezahn 984 3299.200626 group_num = 1 refcount=1 seq=13775 skbaddr=0xffffa21143fd4000 len=42 data_len=0 sample_rate=10 in_ifindex=0 out_ifindex=20 user_cookie=
>>> \xde\xad\xbe\xef
>>> mausezahn 984 3299.281424 group_num = 1 refcount=1 seq=13776 skbaddr=0xffffa21143fd4000 len=42 data_len=0 sample_rate=10 in_ifindex=0 out_ifindex=20 user_cookie=
>>> \xde\xad\xbe\xef
>>>
>>> Note that it prints the cookie itself unlike the tracepoint which only
>>> prints the hashed pointer.
>>>
>>
>> I agree that bpftrace can do the work relying on kfuncs/kprobes. But I guess
>> that also true for many other tracepoints out there, right?
>
> Maybe, but this particular tracepoint is not buried deep inside some
> complex function with manipulated data being passed as arguments.
> Instead, this tracepoint is placed at the very beginning of the function
> and takes the function arguments as its own arguments. The tracepoint
> can be easily replaced with fentry/kprobes like I've shown with the
> example above.
>
>> For development and labs bpftrace is perfectly fine, but using kfuncs and
>> requiring recompilation is harder in production systems compared with using
>> smaller CO-RE tools.
>
> I used bpftrace because it is very easy to write, but I could have done
> the same with libbpf. I have a bunch of such tools that I wrote over the
> years that I compiled once on my laptop and which I copy to various
> machines where I need them.
>
My worry is that if tools are built around a particular kprobe/kfunc they will
break if the function name or its arguments change, where as a tracepoint give
them a bit more stability across kernel versions. This breakage might not be a
huge problem for bpftrace since the user can change the script at runtime, but
libbpf programs will need recompilation or some kind of version-detection mechanism.
Given the observability-oriented nature of psample I can very much see tools
like this being built (I myself plan to write one for OVS repo) and my concern
is having their stability depend on a function name or arguments not changing
across versions.
>> If OVS starts using psample heavily and users need to troubleshoot or merely
>> observe packets as they are sampled in a more efficient way, they are likely
>> to use ebpf for that. I think making it a bit easier (as in, providing a
>> sligthly more stable tracepoint) is worth considering.
>
> I'm not saying that it's not worth considering, I'm simply saying that
> it should be done after gathering operational experience with existing
> mechanisms. It's possible you will conclude that this tracepoint is not
> actually needed.
>
> Also, there are some disadvantages in using tracepoints compared to
> fentry:
>
> https://github.com/Mellanox/mlxsw/commit/e996fd583eff1c43aacb9c79e55f5add12402d7d
> https://lore.kernel.org/all/CAEf4BzbhvD_f=y3SDAiFqNvuErcnXt4fErMRSfanjYQg5=7GJg@mail.gmail.com/#t
>
> Not saying that's the case here, but worth considering / being aware.
>
>> Can you please expand on your concerns about the tracepoint? It's on the
>> main internal function of the module so, even though the function name or
>> its arguments might change, it doesn't seem probable that it'll disappear
>> altogether. Why else would we want to remove the tracepoint?
>
> It's not really concerns, but dissatisfaction. It's my impression (might
> be wrong) that this series commits to adding new interfaces without
> first seriously evaluating existing ones. This is true for this patch
> and patch #2 that adds a new netlink command instead of using
> SO_ATTACH_FILTER like existing applications are doing to achieve the
> same goal.
>
> I guess some will disagree, but wanted to voice my opinion nonetheless.
>
That's a fair point and I appreciate the feedback.
For patch #2, I can concede that it's just making applications slightly simpler
without providing any further stability guarantees. I'm OK removing it.
And, I fail to convince you of the usefulness of the tracepoint, I can remove it
as well.
Thanks.
Powered by blists - more mailing lists