netdev - Re: [PATCH bpf-next v8 10/12] bpf: make TCP tx timestamp bpf extension work

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAL+tcoDPXyvaYFZHe6FBN_+HtMkY3s0hBfRH=o1m+4ZTiFGRJw@mail.gmail.com>
Date: Thu, 6 Feb 2025 08:42:02 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: Martin KaFai Lau <martin.lau@...ux.dev>
Cc: Jakub Kicinski <kuba@...nel.org>, davem@...emloft.net, edumazet@...gle.com, 
	pabeni@...hat.com, dsahern@...nel.org, willemdebruijn.kernel@...il.com, 
	willemb@...gle.com, ast@...nel.org, daniel@...earbox.net, andrii@...nel.org, 
	eddyz87@...il.com, song@...nel.org, yonghong.song@...ux.dev, 
	john.fastabend@...il.com, kpsingh@...nel.org, sdf@...ichev.me, 
	haoluo@...gle.com, jolsa@...nel.org, horms@...nel.org, bpf@...r.kernel.org, 
	netdev@...r.kernel.org
Subject: Re: [PATCH bpf-next v8 10/12] bpf: make TCP tx timestamp bpf
 extension work

On Thu, Feb 6, 2025 at 8:12 AM Jason Xing <kerneljasonxing@...il.com> wrote:
>
> On Thu, Feb 6, 2025 at 5:57 AM Martin KaFai Lau <martin.lau@...ux.dev> wrote:
> >
> > On 2/4/25 5:57 PM, Jakub Kicinski wrote:
> > > On Wed,  5 Feb 2025 02:30:22 +0800 Jason Xing wrote:
> > >> +    if (cgroup_bpf_enabled(CGROUP_SOCK_OPS) &&
> > >> +        SK_BPF_CB_FLAG_TEST(sk, SK_BPF_CB_TX_TIMESTAMPING) && skb) {
> > >> +            struct skb_shared_info *shinfo = skb_shinfo(skb);
> > >> +            struct tcp_skb_cb *tcb = TCP_SKB_CB(skb);
> > >> +
> > >> +            tcb->txstamp_ack_bpf = 1;
> > >> +            shinfo->tx_flags |= SKBTX_BPF;
> > >> +            shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
> > >> +    }
> > >
> > > If BPF program is attached we'll timestamp all skbs? Am I reading this
> > > right?
> >
> > If the attached bpf program explicitly turns on the SK_BPF_CB_TX_TIMESTAMPING
> > bit of a sock, then all skbs of this sock will be tx timestamp-ed.
>
> Martin, I'm afraid it's not like what you expect. Only the last
> portion of the sendmsg will enter the above function which means if
> the size of sendmsg is large, only the last skb will be set SKBTX_BPF
> and be timestamped.

Long time ago, SO_TIMESTAMPING was mostly used to distinguish which
layer the latency issue happens, especially to exclude many cases
caused by the application itself[1].

Thanks to bpf, we can pay more attention to the kernel behaviour, even
like the tiny delay brought by flow control, say, BQL or fair queue in
Qdisc which can be noticed by this bpf extension (for sure, it will
need more work, not now).

[1]
https://netdevconf.info/0x17/sessions/talk/so_timestamping-powering-fleetwide-rpc-monitoring.html
quoting Willem: "With SO_TIMESTAMPING, bugs that are otherwise
incorrectly assumed to be network issues can be attributed to the
kernel. It can isolate transmission, reception and even scheduling
sources."

>
> >
> > >
> > > Wouldn't it be better to let BPF_SOCK_OPS_TS_SND_CB return whether it's
> > > interested in tracing current packet all the way thru the stack?
> >
> > I like this idea. It can give the BPF prog a chance to do skb sampling on a
> > particular socket.
> >
> > The return value of BPF_SOCK_OPS_TS_SND_CB (or any cgroup BPF prog return value)
> > already has another usage, which its return value is currently enforced by the
> > verifier. It is better not to convolute it further.
> >
> > I don't prefer to add more use cases to skops->reply either, which is an union
> > of args[4], such that later progs (in the cgrp prog array) may lose the args value.
> >
> > Jason, instead of always setting SKBTX_BPF and txstamp_ack_bpf in the kernel, a
> > new BPF kfunc can be added so that the BPF prog can call it to selectively set
> > SKBTX_BPF and txstamp_ack_bpf in some skb.
>
> Agreed because at netdev 0x19 I have an explicit plan to share the
> experience from our company about how to trace all the skbs which were
> completed through a kernel module. It's how we use in production
> especially for debug or diagnose use.

I'm not sure if you can see this link[2] because Jamal is still
working on publishing officially. We can wait if it's not accessible
to you temporarily.

[2]: https://0x19.netdevconf.info/paper/5?cap=05arRrN3AEg11M

Thanks,
Jason