netdev - Re: [PATCH bpf-next v8 10/12] bpf: make TCP tx timestamp bpf extension work

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <0a8e7b84-bab6-4852-8616-577d9b561f4c@linux.dev>
Date: Wed, 5 Feb 2025 16:47:04 -0800
From: Martin KaFai Lau <martin.lau@...ux.dev>
To: Jason Xing <kerneljasonxing@...il.com>
Cc: Jakub Kicinski <kuba@...nel.org>, davem@...emloft.net,
 edumazet@...gle.com, pabeni@...hat.com, dsahern@...nel.org,
 willemdebruijn.kernel@...il.com, willemb@...gle.com, ast@...nel.org,
 daniel@...earbox.net, andrii@...nel.org, eddyz87@...il.com, song@...nel.org,
 yonghong.song@...ux.dev, john.fastabend@...il.com, kpsingh@...nel.org,
 sdf@...ichev.me, haoluo@...gle.com, jolsa@...nel.org, horms@...nel.org,
 bpf@...r.kernel.org, netdev@...r.kernel.org
Subject: Re: [PATCH bpf-next v8 10/12] bpf: make TCP tx timestamp bpf
 extension work

On 2/5/25 4:12 PM, Jason Xing wrote:
> On Thu, Feb 6, 2025 at 5:57 AM Martin KaFai Lau <martin.lau@...ux.dev> wrote:
>>
>> On 2/4/25 5:57 PM, Jakub Kicinski wrote:
>>> On Wed,  5 Feb 2025 02:30:22 +0800 Jason Xing wrote:
>>>> +    if (cgroup_bpf_enabled(CGROUP_SOCK_OPS) &&
>>>> +        SK_BPF_CB_FLAG_TEST(sk, SK_BPF_CB_TX_TIMESTAMPING) && skb) {
>>>> +            struct skb_shared_info *shinfo = skb_shinfo(skb);
>>>> +            struct tcp_skb_cb *tcb = TCP_SKB_CB(skb);
>>>> +
>>>> +            tcb->txstamp_ack_bpf = 1;
>>>> +            shinfo->tx_flags |= SKBTX_BPF;
>>>> +            shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
>>>> +    }
>>>
>>> If BPF program is attached we'll timestamp all skbs? Am I reading this
>>> right?
>>
>> If the attached bpf program explicitly turns on the SK_BPF_CB_TX_TIMESTAMPING
>> bit of a sock, then all skbs of this sock will be tx timestamp-ed.
> 
> Martin, I'm afraid it's not like what you expect. Only the last
> portion of the sendmsg will enter the above function which means if
> the size of sendmsg is large, only the last skb will be set SKBTX_BPF
> and be timestamped.

Sure. The last skb of a large msg and more skb of small msg (or MSG_EOR).

My point is, only attaching a bpf alone is not enough. The 
SK_BPF_CB_TX_TIMESTAMPING still needs to be turned on.

> 
>>
>>>
>>> Wouldn't it be better to let BPF_SOCK_OPS_TS_SND_CB return whether it's
>>> interested in tracing current packet all the way thru the stack?
>>
>> I like this idea. It can give the BPF prog a chance to do skb sampling on a
>> particular socket.
>>
>> The return value of BPF_SOCK_OPS_TS_SND_CB (or any cgroup BPF prog return value)
>> already has another usage, which its return value is currently enforced by the
>> verifier. It is better not to convolute it further.
>>
>> I don't prefer to add more use cases to skops->reply either, which is an union
>> of args[4], such that later progs (in the cgrp prog array) may lose the args value.
>>
>> Jason, instead of always setting SKBTX_BPF and txstamp_ack_bpf in the kernel, a
>> new BPF kfunc can be added so that the BPF prog can call it to selectively set
>> SKBTX_BPF and txstamp_ack_bpf in some skb.
> 
> Agreed because at netdev 0x19 I have an explicit plan to share the
> experience from our company about how to trace all the skbs which were
> completed through a kernel module. It's how we use in production
> especially for debug or diagnose use.

This is fine. The bpf prog can still do that by calling the kfunc. I don't see 
why move the bit setting into kfunc makes the whole set won't work.

> I'm not knowledgeable enough about BPF, so I'd like to know if there
> are some functions that I can take as good examples?
> 
> I think it's a standalone and good feature, can I handle it after this series?

Unfortunately, no. Once the default is on, this cannot be changed.

I think Jakub's suggestion to allow bpf prog selectively choose skb to timestamp 
is useful, so I suggested a way to do it.