netdev - Re: [PATCH net-next v2 04/12] net-timestamp: add static key to control the whole bpf extension

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <67166a0997028_42f03294e7@willemb.c.googlers.com.notmuch>
Date: Mon, 21 Oct 2024 10:49:45 -0400
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: Jason Xing <kerneljasonxing@...il.com>, 
 Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc: davem@...emloft.net, 
 edumazet@...gle.com, 
 kuba@...nel.org, 
 pabeni@...hat.com, 
 dsahern@...nel.org, 
 willemb@...gle.com, 
 ast@...nel.org, 
 daniel@...earbox.net, 
 andrii@...nel.org, 
 martin.lau@...ux.dev, 
 eddyz87@...il.com, 
 song@...nel.org, 
 yonghong.song@...ux.dev, 
 john.fastabend@...il.com, 
 kpsingh@...nel.org, 
 sdf@...ichev.me, 
 haoluo@...gle.com, 
 jolsa@...nel.org, 
 bpf@...r.kernel.org, 
 netdev@...r.kernel.org, 
 Jason Xing <kernelxing@...cent.com>
Subject: Re: [PATCH net-next v2 04/12] net-timestamp: add static key to
 control the whole bpf extension

Jason Xing wrote:
> On Mon, Oct 21, 2024 at 5:52 AM Willem de Bruijn
> <willemdebruijn.kernel@...il.com> wrote:
> >
> > Jason Xing wrote:
> > > From: Jason Xing <kernelxing@...cent.com>
> > >
> > > Willem suggested that we use a static key to control. The advantage
> > > is that we will not affect the existing applications at all if we
> > > don't load BPF program.
> > >
> > > In this patch, except the static key, I also add one logic that is
> > > used to test if the socket has enabled its tsflags in order to
> > > support bpf logic to allow both cases to happen at the same time.
> > > Or else, the skb carring related timestamp flag doesn't know which
> > > way of printing is desirable.
> > >
> > > One thing important is this patch allows print from both applications
> > > and bpf program at the same time. Now we have three kinds of print:
> > > 1) only BPF program prints
> > > 2) only application program prints
> > > 3) both can print without side effect
> > >
> > > Signed-off-by: Jason Xing <kernelxing@...cent.com>
> >
> > Getting back to this thread. It is long, instead of responding to
> > multiple messages, let me combine them in a single response.
> 
> Thank you so much!
> 
> >
> >
> > * On future extensions:
> >
> > +1 that the UDP case, and datagrams more broadly, must have a clear
> > development path, before we can merge TCP.
> >
> > Similarly, hardware timestamps need not be supported from the start,
> > but must clearly be supportable.
> 
> Agreed. Using the standalone sk_tsflags_bpf and tskey_bpf and removing
> the TCP bpf test logic(say, BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG)
> could work well for both protos. Let me give it a try first.

Great, thanks.

> >
> >
> > * On queueing packets to userspace:
> >
> > > > the current behavior is to just queue to the sk_error_queue as long
> > > > as there is "SOF_TIMESTAMPING_TX_*" set in the skb's tx_flags and it
> > > > is regardless of the sk_tsflags. "
> >
> > > Totally correct. SOF_TIMESTAMPING_SOFTWARE is a report flag while
> > > SOF_TIMESTAMPING_TX_* are generation flags. Without former, users can
> > > read the skb from the errqueue but are not able to parse the
> > > timestamps
> 
> Above is what I tried to explain how the application timestamping
> feature works, not what I tried to implement for the BPF extension.
> 
> >
> > Before queuing a packet to userspace on the error queue, the relevant
> > reporting flag is always tested. sock_recv_timestamp has:
> >
> >         /*
> >          * generate control messages if
> >          * - receive time stamping in software requested
> >          * - software time stamp available and wanted
> >          * - hardware time stamps available and wanted
> >          */
> >         if (sock_flag(sk, SOCK_RCVTSTAMP) ||
> >             (tsflags & SOF_TIMESTAMPING_RX_SOFTWARE) ||
> >             (kt && tsflags & SOF_TIMESTAMPING_SOFTWARE) ||
> >             (hwtstamps->hwtstamp &&
> >              (tsflags & SOF_TIMESTAMPING_RAW_HARDWARE)))
> >                 __sock_recv_timestamp(msg, sk, skb);
> >
> > Otherwise applications could get error messages queued, and
> > epoll/poll/select would unexpectedly behave differently.
> 
> Right. And I have no intention to use the SOF_TIMESTAMPING_SOFTWARE
> flag for BPF.

Can you elaborate on this? This sounds like it would go against the
intent to have the two versions of the API (application and BPF) be
equivalent.
 
> >
> > > SOF_TIMESTAMPING_SOFTWARE is only used in traditional SO_TIMESTAMPING
> > > features including cmsg mode. But it will not be used in bpf mode.
> >
> > For simplicity, the two uses of the API are best kept identical. If
> > there is a technical reason why BPF has to diverge from established
> > behavior, this needs to be explicitly called out in the commit
> > message.
> >
> > Also, if you want to extend the API for BPF in the future, good to
> > call this out now and ideally extensions will apply to both, to
> > maintain a uniform API.
> 
> As you said, I also agree on "two uses of the API are best kept identical".
> 
> >
> >
> > * On extra measurement points, at sendmsg or tcp_write_xmit:
> >
> > The first is interesting. For application timestamping, this was
> > never needed, as the application can just call clock_gettime before
> > sendmsg.
> 
> Yes, we could add it after we finish the current series. I'm going to
> write it down on my todo list.
> 
> >
> > In general, additional measurement points are not only useful if the
> > interval between is not constant. So far, we have seen no need for
> > any additional points.
> 
> Taking a snapshot of tcp_write_xmit() could be useful especially when
> the skb is not transmitted due to nagle algorithm.
> 
> >
> >
> > * On skb state:
> >
> > > > For now, is there thing we can explore to share in the skb_shared_info?
> >
> > skb_shinfo space is at a premium. I don't think we can justify two
> > extra fields just for this use case.
> >
> > > My initial thought is just to reuse these fields in skb. It can work
> > > without interfering one another.
> >
> > I'm skeptical that two methods can work at the same time. If they are
> > started at different times, their sk_tskey will be different, for one.
> 
> Right, sk_tskey is the only special one that I will take care of.
> Others like tx_flags or txstamp_ack from struct tcp_skb_cb can be
> reused.
> 
> >
> > There may be workarounds. Maybe BPF can store its state in some BPF
> > specific field, indeed. Or perhaps it can store per-sk shadow state
> > that resolves the conflict. For instance, the offset between sk_tskey
> > and bpf_tskey.
> 
> Things could get complicated in the future if we want to unified the
> final tskey value for all the cases. Since 1) the value of
> shinfo->tskey depends on skb seq and len, 2) the final tskey output is
> the diff between sk_tskey and shinfo->tskey, can I add a bpf_tskey in
> struct sock and related output logic for bpf without caring if it's
> the same as sk_tskey.

I think we can add fields to struct sock without too much concern.
Adding fields to sk_buff or skb_shared_info would be more difficult.

> That said, the outputs from two methods differ. Do you think it is
> acceptable? It could be simpler and easier if we keep them identical.

Since we can only have one skb_shared_info.tskey, if both user and bpf
request OPT_ID, starting at different times, then we will have two
bases against which to compute the difference. Having two fields in
struct sock should suffice.