lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAL+tcoDKuomdJVYo6FVNH60NojYRNp3zjNyt8Tr8Od1Fp2BfWA@mail.gmail.com>
Date: Thu, 17 Oct 2024 10:28:01 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: Martin KaFai Lau <martin.lau@...ux.dev>
Cc: Jakub Sitnicki <jakub@...udflare.com>, davem@...emloft.net, edumazet@...gle.com, 
	kuba@...nel.org, pabeni@...hat.com, dsahern@...nel.org, 
	willemdebruijn.kernel@...il.com, willemb@...gle.com, ast@...nel.org, 
	daniel@...earbox.net, andrii@...nel.org, eddyz87@...il.com, song@...nel.org, 
	yonghong.song@...ux.dev, john.fastabend@...il.com, kpsingh@...nel.org, 
	sdf@...ichev.me, haoluo@...gle.com, jolsa@...nel.org, bpf@...r.kernel.org, 
	netdev@...r.kernel.org, Jason Xing <kernelxing@...cent.com>
Subject: Re: [PATCH net-next v2 04/12] net-timestamp: add static key to
 control the whole bpf extension

On Thu, Oct 17, 2024 at 8:48 AM Martin KaFai Lau <martin.lau@...ux.dev> wrote:
>
> On 10/16/24 3:36 AM, Jason Xing wrote:
> >>> If the skb carries the timestamp, there are three cases:
> >>> 1) non-bpf case and users uses setsockopt()
> >>> 2) cmsg case
> >>> 3) bpf case
>
> These should have tests in the selftests/bpf/ sooner than later. (More below).
>
> >>>
> >>> #1 and #2 are already handled well before this patch. I only need to
> >>> test if sk_tsflags_bpf has those flags. If so, it means we hit #3, or
> >>> else it could be #1 or #2, then we will let the old way print
> >>> timestamps in __skb_tstamp_tx().
> >>
> >> hmm... I am still not sure I fully understand...but I think I may start getting it.
> >
> > Sorry, my bad. I gave the wrong answer...
> >
> > It should be:
> > Testing if if sk_tsflags has SOF_TIMESTAMPING_SOFTWARE flag should
>
> You meant adding SOF_TIMESTAMPING_SOFTWARE test to the sk_tstamp_tx_flags()?

Yep.

>
> Before any bpf changes, if I read __skb_tstamp_tx() correctly, the current
> behavior is to just queue to the sk_error_queue as long as there is
> "SOF_TIMESTAMPING_TX_*" set in the skb's tx_flags and it is regardless of the
> sk_tsflags. This will eventually get ignored when user read it from the error
> queue because the SOF_TIMESTAMPING_SOFTWARE is not set in sk_tsflags?

Totally correct. SOF_TIMESTAMPING_SOFTWARE is a report flag while
SOF_TIMESTAMPING_TX_* are generation flags. Without former, users can
read the skb from the errqueue but are not able to parse the
timestamps. Please see
tcp_recvmsg()->inet_recv_error()->ip_recv_error()->sock_recv_timestamp()->__sock_recv_timestamp():
if ((tsflags & SOF_TIMESTAMPING_SOFTWARE...
       ktime_to_timespec64_cond(skb->tstamp, tss.ts + 0))

> I suspect
> the user space will still read something from the error queue unless there is
> SOF_TIMESTAMPING_OPT_TSONLY but it won't have the tstamp cmsg.

No, please see above.

>
> Adding SOF_TIMESTAMPING_SOFTWARE test to the sk_tstamp_tx_flags() will stop it
> from even queuing to the error queue? I think it is ok but I am not sure if
> anyone is depending on the above behavior.

SOF_TIMESTAMPING_SOFTWARE is only used in traditional SO_TIMESTAMPING
features including cmsg mode. But it will not be used in bpf mode. So
the test statement is enough to divided those three cases into two
groups.

>
> > work fine. If it has the flag, we could use skb_tstamp_tx_output() to
> > print based on patch [4/12]; if not, we will use
> > bpf_skb_tstamp_tx_output() to print.
> >
> > If users use traditional ways of deploying SO_TIMESTAMPING, sk_tsflags
> > always has SOF_TIMESTAMPING_SOFTWARE which is a software report flag
> > (please see Documentation/networking/timestamping.rst). We can see a
> > good example on how to use in
> > tools/testing/selftests/net/txtimestamp.c:
> > do_test()
> > {
> >          sock_opt = SOF_TIMESTAMPING_SOFTWARE |
> >          ...
> >          if (setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING,
> >                                (char *) &sock_opt, sizeof(sock_opt)))
> > }
> >
> >>
> >> Is it the reason that the bpf_setsockopt() cannot clear the sk_tsflags_bpf once
> >> it is set in patch 2? It is not a usable api tbh. It will be a surprise to many.
> >> It has to be able to set and clear.
> >
> > I cannot find a good time to clear all the sockets which are set
> > through the BPF program. If we detach the BPF program, it will not
> > print of course. Does it really matter if we don't clear the
> > sk_tsflags_bpf?
>
> Yes, it matters. The same reason goes for why the existing bpf prog can clear
> the tp->bpf_sock_ops_cb_flags. Yes, detach will automatically not taking the
> timestamp. For sockops program that stays forever, not all usages expect to do
> timestamping for the whole lifetime of the connection. If there is a way for the
> prog to turn it on, it should have a way for the prog to turn it off.

I see what you meant here. If we don't clear sk_tsflags_bpf, after we
detach the bpf program, it will do nothing in __skb_tstamp_tx() and
return earlier. It is almost equal to the effect of turning off. It is
why I don't handle clearing the flag.

>
> What is the concern of allowing the bpf prog to disable something that it has
> enabled before?

Let me give one instance:
If one socket is established and stays idle, how can the bpf program
clear the tsflags from that socket? I have no idea.

>
> While we are on bpf_sock_ops_cb_flags, the
> BPF_SOCK_OPS_TX_TIMESTAMPING_OPT_CB_FLAG addition is mostly a dup of whatever in
> the new sk_tsflags_bpf. It is something we need to clean up later when we decide
> what interface to use for bpf timestamping.

I'm not sure if I understand correctly. I mimicked the use of
BPF_SOCK_OPS_RTO_CB_FLAG. Do you mean we can remove the use of
bpf_sock_ops_cb_flags_set() in BPF program?

>
> >
> >>
> >> Does it also mean either the bpf or the user space can enable the timetstamping
> >> but not both? I don't think we can assume this also. It will be hard to deploy
> >> the bpf prog in production to collect continuous data. The user space may have
> >> some timestamping enabled but the bpf may want to do its parallel investigation
> >> also. The user space may rollout timestamping in the future and suddenly break
> >> the bpf prog.
> >
> > Well, IIUC, it's also the basic idea from the current series which
> > allows both happening at the same time. Let us put it in a simple way,
> > I hope that if the app uses the SO_TIMESTAMPING feature already, then
> > one admin deploys the BPF program, that app should be traced both in
> > bpf and non-bpf ways.
> >
> > But Willem doesn't agree about this approach[1] because of hard to debug.
> >
> > [1]: https://lore.kernel.org/all/670dda9437147_2e6c4029461@willemb.c.googlers.com.notmuch/
> > Regarding to this link, I have a few more words to say: the socket
> > could be set through bpf_setsockopt() in different phases not like
> > setsockopt(), so in some cases we cannot make setsockopt hard failed.
> >
> > After rethinking this point more, I still reckon that letting BPF
> > program trace timestamping parallelly without caring whether the
> > socket is set to the SO_TIMESTAMPING feature through setsockopt()
>
> I am afraid having both work in parallel is needed. Otherwise, it will be very
> hard to deploy a bpf prog to run continuously in scale. Being able to collect
> timestamp without worrying about application changes/updates/downgrades is
> important. e.g. App changes from no time stamping to time stamping

Sorry, I didn't make myself clear. Yesterday, I said I agreed with you
:) So let me keep the current logic of printing (see the
__skb_tstamp_tx() function in patch [04/12]) in the next version. Then
I don't need to add some test statements to distinguish which way of
printing.

>
> Please help to add selftests to show how the above cases (1), (2), (3), and
> other tsflags/txflags sharing cases will work. This should not be delayed until
> the discussion is done. It is needed sooner or later to prove both bpf and
> non-bpf ways can work at the same time. It will help the reviewer and also help
> to think about the design with a real use case in bpf prog.

Got it. But I'm not sure where I should put those test cases? Could
you help me point out a good example that I can follow?

>
> The example in patch 0 only prints the reported tstamp, can you share how it
> will be used to investigate issue?

No problem. Please see chapter 3 about "goal" in
https://netdev.bots.linux.dev/netconf/2024/jason_xing.pdf.

> Is it also useful to know when the skb is
> written to the kernel during sendmsg()?

You are right. Before this patch, normally applications will record an
accurate timestamp before do sendmsg().

After you remind me of this, I feel that we can add the timestamp
print in the future for bpf use.

>
> Regarding the bpf_setsockopt() can be called in different phase,
> bpf_setsockopt() is not limited to sockops program. e.g. it can also be called
> from a bpf-tcp-cc (congestion control). Not a tcp-cc expert but I won't be
> surprised people will try to trigger some on-and-off timestamping from
> bpf-tcp-cc to measure some delay.
>
>
> More about bpf_setsockopt() in different phase, understand that UDP is not your
> priority. However, it needs to have some clarity on how UDP will work and how to
> enable it. UDP usually has no connect/established phase.

For now, I don't expect an extension for UDP because it will bring too
much extra work. Could we discuss this later? I mean, after we finish
the basic bpf extension :)

>
> Regarding the SOF_TIMESTAMPING_* support, can you list out what else you are
> planning to support in the future. You mentioned the SOF_TIMESTAMPING_TX_ACK in
> another thread. What else?

In this patch series, I support
SOF_TIMESTAMPING_TX_SCHED|SOF_TIMESTAMPING_TX_ACK|SOF_TIMESTAMPING_TX_SOFTWARE,
which you've already noticed from the BPF example in patch [0/12].
They all come from the original design of SO_TIMESTAMPING feature.

The question you proposed is what I am willing to implement in the
future, like adding one hook in tcp_write_xmit()? It's part of my
plans to extend in the future, not be included in this series.

Thanks for your review.

Thanks,
Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ