[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAL+tcoD5Owoa+xajSzCWHHN+AUWrmT-xu_Nm8SK2_obED_iWBA@mail.gmail.com>
Date: Thu, 31 Oct 2024 11:27:03 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: Martin KaFai Lau <martin.lau@...ux.dev>
Cc: davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org,
pabeni@...hat.com, dsahern@...nel.org, willemdebruijn.kernel@...il.com,
willemb@...gle.com, ast@...nel.org, daniel@...earbox.net, andrii@...nel.org,
eddyz87@...il.com, song@...nel.org, yonghong.song@...ux.dev,
john.fastabend@...il.com, kpsingh@...nel.org, sdf@...ichev.me,
haoluo@...gle.com, jolsa@...nel.org, shuah@...nel.org, ykolal@...com,
bpf@...r.kernel.org, netdev@...r.kernel.org,
Jason Xing <kernelxing@...cent.com>
Subject: Re: [PATCH net-next v3 10/14] net-timestamp: add basic support with
tskey offset
On Thu, Oct 31, 2024 at 10:41 AM Jason Xing <kerneljasonxing@...il.com> wrote:
>
> On Thu, Oct 31, 2024 at 9:17 AM Martin KaFai Lau <martin.lau@...ux.dev> wrote:
> >
> > On 10/29/24 11:50 PM, Jason Xing wrote:
> > > On Wed, Oct 30, 2024 at 1:42 PM Martin KaFai Lau <martin.lau@...ux.dev> wrote:
> > >>
> > >> On 10/28/24 4:05 AM, Jason Xing wrote:
> > >>> +/* Used to track the tskey for bpf extension
> > >>> + *
> > >>> + * @sk_tskey: bpf extension can use it only when no application uses.
> > >>> + * Application can use it directly regardless of bpf extension.
> > >>> + *
> > >>> + * There are three strategies:
> > >>> + * 1) If we've already set through setsockopt() and here we're going to set
> > >>> + * OPT_ID for bpf use, we will not re-initialize the @sk_tskey and will
> > >>> + * keep the record of delta between the current "key" and previous key.
> > >>> + * 2) If we've already set through bpf_setsockopt() and here we're going to
> > >>> + * set for application use, we will record the delta first and then
> > >>> + * override/initialize the @sk_tskey.
> > >>> + * 3) other cases, which means only either of them takes effect, so initialize
> > >>> + * everything simplely.
> > >>> + */
> > >>> +static long int sock_calculate_tskey_offset(struct sock *sk, int val, int bpf_type)
> > >>> +{
> > >>> + u32 tskey;
> > >>> +
> > >>> + if (sk_is_tcp(sk)) {
> > >>> + if ((1 << sk->sk_state) & (TCPF_CLOSE | TCPF_LISTEN))
> > >>> + return -EINVAL;
> > >>> +
> > >>> + if (val & SOF_TIMESTAMPING_OPT_ID_TCP)
> > >>> + tskey = tcp_sk(sk)->write_seq;
> > >>> + else
> > >>> + tskey = tcp_sk(sk)->snd_una;
> > >>> + } else {
> > >>> + tskey = 0;
> > >>> + }
> > >>> +
> > >>> + if (bpf_type && (sk->sk_tsflags & SOF_TIMESTAMPING_OPT_ID)) {
> > >>> + sk->sk_tskey_bpf_offset = tskey - atomic_read(&sk->sk_tskey);
> > >>> + return 0;
> > >>> + } else if (!bpf_type && (sk->sk_tsflags_bpf & SOF_TIMESTAMPING_OPT_ID)) {
> > >>> + sk->sk_tskey_bpf_offset = atomic_read(&sk->sk_tskey) - tskey;
> > >>> + } else {
> > >>> + sk->sk_tskey_bpf_offset = 0;
> > >>> + }
> > >>> +
> > >>> + return tskey;
> > >>> +}
> > >>
> > >> Before diving into this route, the bpf prog can peek into the tcp seq no in the
> > >> skb. It can also look at the sk->sk_tskey for UDP socket. Can you explain why
> > >> those are not enough information for the bpf prog?
> > >
> > > Well, it does make sense. It seems we don't need to implement tskey
> > > for this bpf feature...
> > >
> > > Due to lack of enough knowledge of bpf, could you provide more hints
> > > that I can follow to write a bpf program to print more information
> > > from the skb? Like in the last patch of this series, in
> > > tools/testing/selftests/bpf/prog_tests/so_timestamping.c, do we have a
> > > feasible way to do that?
> >
> > The bpf-prog@...dmsg() will be run to capture a timestamp for sendmsg().
> > When running the bpf-prog@...dmsg(), the skb can be set to the "struct
> > bpf_sock_ops_kern sock_ops;" which is passed to the sockops prog. Take a look at
> > bpf_skops_write_hdr_opt().
>
> Thanks. I see the skb field in struct bpf_sock_ops_kern.
>
> >
> > bpf prog cannot directly access the skops->skb now. It is because the sockops
> > prog sees the uapi "struct bpf_sock_ops" instead of "struct
> > bpf_sock_ops(_kern)". The conversion is done in sock_ops_convert_ctx_access. It
> > is an old way before BTF. I don't want to extend the uapi "struct bpf_sock_ops".
>
> Oh, so it seems we cannot use this way, right?
>
> I also noticed a use case that allow users to get the information from one skb:
> "int BPF_PROG(trace_netif_receive_skb, struct sk_buff *skb)" in
> tools/testing/selftests/bpf/progs/netif_receive_skb.c
> But it requires us to add the tracepoint in __skb_tstamp_tx() first.
> Two months ago, I was planning to use a tracepoint for some people who
> find it difficult to deploy bpf.
>
> >
> > Instead, use bpf_cast_to_kern_ctx((struct bpf_sock_ops *)skops_ctx) to get a
> > trusted "struct bpf_sock_ops(_kern) *skops" pointer. Then it can access the
> > skops->skb.
>
> Let me spend some time on it. Thanks.
>
> > afaik, the tcb->seq should be available already during sendmsg. it
> > should be able to get it from TCP_SKB_CB(skb)->seq with the bpf_core_cast. Take
> > a look at the existing examples of bpf_core_cast.
> >
> > The same goes for the skb->data. It can use the bpf_dynptr_from_skb(). It is not
> > available to skops program now but should be easy to expose.
>
> I wonder what the use of skb->data is here.
>
> >
> > The bpf prog wants to calculate the delay between [sendmsg, SCHED], [SCHED,
> > SND], [SND, ACK]. It is why (at least in my mental model) a key is needed to
> > co-relate the sendmsg, SCHED, SND, and ACK timestamp. The tcp seqno could be
> > served as that key.
> >
> > All that said, while looking at tcp_tx_timestamp() again, there is always
> > "shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;". shinfo->tskey can be
> > used directly as-is by the bpf prog. I think now I am missing why the bpf prog
> > needs the sk_tskey in the sk?
>
> As you said, tcp seqno could be treated as the key, but it leaks the
> information in TCP layer to users. Please see the commit:
> commit 4ed2d765dfaccff5ebdac68e2064b59125033a3b
> Author: Willem de Bruijn <willemb@...gle.com>
> Date: Mon Aug 4 22:11:49 2014 -0400
>
> net-timestamp: TCP timestamping
> ...
> - To avoid leaking the absolute seqno to userspace, the offset
> returned in ee_data must always be relative. It is an offset between
> an skb and sk field.
>
> It has to be computed in the kernel before reporting to the user space, I think.
Well, I'm thinking since the BPF program can only be used by _admin_,
we will not take any risk even if the raw seq is exported to the BPF
program.
Willem, I would like to know your opinions about this point (about
whether we can export the raw seqno or not). Thanks.
Thanks,
Jason
Powered by blists - more mailing lists