linux-kernel - Re: [PATCH net-next 4/8] net: psample: add tracepoint

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <96bd71d6-2978-435f-99f8-c31097487cac@redhat.com>
Date: Mon, 29 Apr 2024 07:33:59 +0200
From: Adrian Moreno <amorenoz@...hat.com>
To: Ido Schimmel <idosch@...sch.org>
Cc: netdev@...r.kernel.org, aconole@...hat.com, echaudro@...hat.com,
 horms@...nel.org, i.maximets@....org, Yotam Gigi <yotam.gi@...il.com>,
 "David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
 Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH net-next 4/8] net: psample: add tracepoint



On 4/25/24 17:25, Ido Schimmel wrote:
> On Thu, Apr 25, 2024 at 10:06:20AM +0200, Adrian Moreno wrote:
>>
>>
>> On 4/25/24 09:18, Ido Schimmel wrote:
>>> On Wed, Apr 24, 2024 at 03:50:51PM +0200, Adrian Moreno wrote:
>>>> Currently there are no widely-available tools to dump the metadata and
>>>> group information when a packet is sampled, making it difficult to
>>>> troubleshoot related issues.
>>>>
>>>> This makes psample use the event tracing framework to log the sampling
>>>> of a packet so that it's easier to quickly identify the source
>>>> (i.e: group) and context (i.e: metadata) of a packet being sampled.
>>>>
>>>> This patch creates some checkpatch splats, but the style of the
>>>> tracepoint definition mimics that of other modules so it seems
>>>> acceptable.
>>>
>>> I don't see a good reason to add this tracepoint (which we won't be able
>>> to remove) when you can easily do that with bpftrace which by now should
>>> be widely available:
>>>
>>> #!/usr/bin/bpftrace
>>>
>>> kfunc:psample_sample_packet
>>> {
>>>           $ts_us = nsecs() / 1000;
>>>           $secs = $ts_us / 1000000;
>>>           $us = $ts_us % 1000000;
>>>           $group = args.group;
>>>           $skb = args.skb;
>>>           $md = args.md;
>>>
>>>           printf("%-16s %-6d %6llu.%6llu group_num = %u refcount=%u seq=%u skbaddr=%p len=%u data_len=%u sample_rate=%u in_ifindex=%d out_ifindex=%d user_cookie=%rx\n",
>>>                  comm, pid, $secs, $us, $group->group_num, $group->refcount, $group->seq,
>>>                  $skb, $skb->len, $skb->data_len, args.sample_rate,
>>>                  $md->in_ifindex, $md->out_ifindex,
>>>                  buf($md->user_cookie, $md->user_cookie_len));
>>> }
>>>
>>> Example output:
>>>
>>> mausezahn        984      3299.200626 group_num = 1 refcount=1 seq=13775 skbaddr=0xffffa21143fd4000 len=42 data_len=0 sample_rate=10 in_ifindex=0 out_ifindex=20 user_cookie=
>>> \xde\xad\xbe\xef
>>> mausezahn        984      3299.281424 group_num = 1 refcount=1 seq=13776 skbaddr=0xffffa21143fd4000 len=42 data_len=0 sample_rate=10 in_ifindex=0 out_ifindex=20 user_cookie=
>>> \xde\xad\xbe\xef
>>>
>>> Note that it prints the cookie itself unlike the tracepoint which only
>>> prints the hashed pointer.
>>>
>>
>> I agree that bpftrace can do the work relying on kfuncs/kprobes. But I guess
>> that also true for many other tracepoints out there, right?
> 
> Maybe, but this particular tracepoint is not buried deep inside some
> complex function with manipulated data being passed as arguments.
> Instead, this tracepoint is placed at the very beginning of the function
> and takes the function arguments as its own arguments. The tracepoint
> can be easily replaced with fentry/kprobes like I've shown with the
> example above.
> 
>> For development and labs bpftrace is perfectly fine, but using kfuncs and
>> requiring recompilation is harder in production systems compared with using
>> smaller CO-RE tools.
> 
> I used bpftrace because it is very easy to write, but I could have done
> the same with libbpf. I have a bunch of such tools that I wrote over the
> years that I compiled once on my laptop and which I copy to various
> machines where I need them.
>

My worry is that if tools are built around a particular kprobe/kfunc they will 
break if the function name or its arguments change, where as a tracepoint give 
them a bit more stability across kernel versions. This breakage might not be a 
huge problem for bpftrace since the user can change the script at runtime, but 
libbpf programs will need recompilation or some kind of version-detection mechanism.

Given the observability-oriented nature of psample I can very much see tools 
like this being built (I myself plan to write one for OVS repo) and my concern 
is having their stability depend on a function name or arguments not changing 
across versions.


>> If OVS starts using psample heavily and users need to troubleshoot or merely
>> observe packets as they are sampled in a more efficient way, they are likely
>> to use ebpf for that. I think making it a bit easier (as in, providing a
>> sligthly more stable tracepoint) is worth considering.
> 
> I'm not saying that it's not worth considering, I'm simply saying that
> it should be done after gathering operational experience with existing
> mechanisms. It's possible you will conclude that this tracepoint is not
> actually needed.
> 
> Also, there are some disadvantages in using tracepoints compared to
> fentry:
> 
> https://github.com/Mellanox/mlxsw/commit/e996fd583eff1c43aacb9c79e55f5add12402d7d
> https://lore.kernel.org/all/CAEf4BzbhvD_f=y3SDAiFqNvuErcnXt4fErMRSfanjYQg5=7GJg@mail.gmail.com/#t
> 
> Not saying that's the case here, but worth considering / being aware.
> 
>> Can you please expand on your concerns about the tracepoint? It's on the
>> main internal function of the module so, even though the function name or
>> its arguments might change, it doesn't seem probable that it'll disappear
>> altogether. Why else would we want to remove the tracepoint?
> 
> It's not really concerns, but dissatisfaction. It's my impression (might
> be wrong) that this series commits to adding new interfaces without
> first seriously evaluating existing ones. This is true for this patch
> and patch #2 that adds a new netlink command instead of using
> SO_ATTACH_FILTER like existing applications are doing to achieve the
> same goal.
> 
> I guess some will disagree, but wanted to voice my opinion nonetheless.
> 

That's a fair point and I appreciate the feedback.

For patch #2, I can concede that it's just making applications slightly simpler 
without providing any further stability guarantees. I'm OK removing it.

And, I fail to convince you of the usefulness of the tracepoint, I can remove it 
as well.

Thanks.