[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAL+tcoD-fzq7dSwkM4nRE8vF-y=+RO1y8X=95+D8Gv3QXTRWCA@mail.gmail.com>
Date: Wed, 16 Oct 2024 18:36:12 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: Martin KaFai Lau <martin.lau@...ux.dev>
Cc: Jakub Sitnicki <jakub@...udflare.com>, davem@...emloft.net, edumazet@...gle.com,
kuba@...nel.org, pabeni@...hat.com, dsahern@...nel.org,
willemdebruijn.kernel@...il.com, willemb@...gle.com, ast@...nel.org,
daniel@...earbox.net, andrii@...nel.org, eddyz87@...il.com, song@...nel.org,
yonghong.song@...ux.dev, john.fastabend@...il.com, kpsingh@...nel.org,
sdf@...ichev.me, haoluo@...gle.com, jolsa@...nel.org, bpf@...r.kernel.org,
netdev@...r.kernel.org, Jason Xing <kernelxing@...cent.com>
Subject: Re: [PATCH net-next v2 04/12] net-timestamp: add static key to
control the whole bpf extension
On Wed, Oct 16, 2024 at 4:31 PM Martin KaFai Lau <martin.lau@...ux.dev> wrote:
>
> On 10/16/24 12:54 AM, Jason Xing wrote:
> > On Wed, Oct 16, 2024 at 3:01 PM Martin KaFai Lau <martin.lau@...ux.dev> wrote:
> >>
> >> On 10/15/24 11:30 PM, Jason Xing wrote:
> >>> On Wed, Oct 16, 2024 at 2:13 PM Martin KaFai Lau <martin.lau@...ux.dev> wrote:
> >>>>
> >>>> On 10/15/24 6:32 PM, Jason Xing wrote:
> >>>>> On Wed, Oct 16, 2024 at 9:04 AM Jason Xing <kerneljasonxing@...il.com> wrote:
> >>>>>>
> >>>>>> On Wed, Oct 16, 2024 at 8:10 AM Martin KaFai Lau <martin.lau@...ux.dev> wrote:
> >>>>>>>
> >>>>>>> On 10/11/24 9:06 PM, Jason Xing wrote:
> >>>>>>>> From: Jason Xing <kernelxing@...cent.com>
> >>>>>>>>
> >>>>>>>> Willem suggested that we use a static key to control. The advantage
> >>>>>>>> is that we will not affect the existing applications at all if we
> >>>>>>>> don't load BPF program.
> >>>>>>>>
> >>>>>>>> In this patch, except the static key, I also add one logic that is
> >>>>>>>> used to test if the socket has enabled its tsflags in order to
> >>>>>>>> support bpf logic to allow both cases to happen at the same time.
> >>>>>>>> Or else, the skb carring related timestamp flag doesn't know which
> >>>>>>>> way of printing is desirable.
> >>>>>>>>
> >>>>>>>> One thing important is this patch allows print from both applications
> >>>>>>>> and bpf program at the same time. Now we have three kinds of print:
> >>>>>>>> 1) only BPF program prints
> >>>>>>>> 2) only application program prints
> >>>>>>>> 3) both can print without side effect
> >>>>>>>>
> >>>>>>>> Signed-off-by: Jason Xing <kernelxing@...cent.com>
> >>>>>>>> ---
> >>>>>>>> include/net/sock.h | 1 +
> >>>>>>>> net/core/filter.c | 3 +++
> >>>>>>>> net/core/skbuff.c | 38 ++++++++++++++++++++++++++++++++++++++
> >>>>>>>> 3 files changed, 42 insertions(+)
> >>>>>>>>
> >>>>>>>> diff --git a/include/net/sock.h b/include/net/sock.h
> >>>>>>>> index 66ecd78f1dfe..b7c51b95c92d 100644
> >>>>>>>> --- a/include/net/sock.h
> >>>>>>>> +++ b/include/net/sock.h
> >>>>>>>> @@ -2889,6 +2889,7 @@ static inline bool sk_dev_equal_l3scope(struct sock *sk, int dif)
> >>>>>>>> void sock_def_readable(struct sock *sk);
> >>>>>>>>
> >>>>>>>> int sock_bindtoindex(struct sock *sk, int ifindex, bool lock_sk);
> >>>>>>>> +DECLARE_STATIC_KEY_FALSE(bpf_tstamp_control);
> >>>>>>>> void sock_set_timestamp(struct sock *sk, int optname, bool valbool);
> >>>>>>>> int sock_get_timestamping(struct so_timestamping *timestamping,
> >>>>>>>> sockptr_t optval, unsigned int optlen);
> >>>>>>>> diff --git a/net/core/filter.c b/net/core/filter.c
> >>>>>>>> index 996426095bd9..08135f538c99 100644
> >>>>>>>> --- a/net/core/filter.c
> >>>>>>>> +++ b/net/core/filter.c
> >>>>>>>> @@ -5204,6 +5204,8 @@ static const struct bpf_func_proto bpf_get_socket_uid_proto = {
> >>>>>>>> .arg1_type = ARG_PTR_TO_CTX,
> >>>>>>>> };
> >>>>>>>>
> >>>>>>>> +DEFINE_STATIC_KEY_FALSE(bpf_tstamp_control);
> >>>>>>>> +
> >>>>>>>> static int bpf_sock_set_timestamping(struct sock *sk,
> >>>>>>>> struct so_timestamping *timestamping)
> >>>>>>>> {
> >>>>>>>> @@ -5217,6 +5219,7 @@ static int bpf_sock_set_timestamping(struct sock *sk,
> >>>>>>>> return -EINVAL;
> >>>>>>>>
> >>>>>>>> WRITE_ONCE(sk->sk_tsflags[BPFPROG_TS_REQUESTOR], flags);
> >>>>>>>> + static_branch_enable(&bpf_tstamp_control);
> >>>>>>>
> >>>>>>> Not sure when is a good time to do static_branch_disable().
> >>>>>>
> >>>>>> Thanks for the review.
> >>>>>>
> >>>>>> To be honest, I considered how to disable the static key. Like you
> >>>>>> said, I failed to find a good chance that I can accurately disable it.
> >>>>>>
> >>>>>>>
> >>>>>>> The bpf prog may be detached also. (IF) it ends up staying with the
> >>>>>>> cgroup/sockops interface, it should depend on the existing static key in
> >>>>>>> cgroup_bpf_enabled(CGROUP_SOCK_OPS) instead of adding another one.
> >>>>>>
> >>>>>> Are you suggesting that we need to remove the current static key? In
> >>>>>> the previous thread, the reason why Willem came up with this idea is,
> >>>>>> I think, to avoid affect the non-bpf timestamping feature.
> >>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>> return 0;
> >>>>>>>> }
> >>>>>>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> >>>>>>>> index f36eb9daa31a..d0f912f1ff7b 100644
> >>>>>>>> --- a/net/core/skbuff.c
> >>>>>>>> +++ b/net/core/skbuff.c
> >>>>>>>> @@ -5540,6 +5540,29 @@ void skb_complete_tx_timestamp(struct sk_buff *skb,
> >>>>>>>> }
> >>>>>>>> EXPORT_SYMBOL_GPL(skb_complete_tx_timestamp);
> >>>>>>>>
> >>>>>>>> +static bool sk_tstamp_tx_flags(struct sock *sk, u32 tsflags, int tstype)
> >>>>>>>
> >>>>>>> sk is unused.
> >>>>>>
> >>>>>> Thanks for the careful check.
> >>>>>>
> >>>>>>>
> >>>>>>>> +{
> >>>>>>>> + u32 testflag;
> >>>>>>>> +
> >>>>>>>> + switch (tstype) {
> >>>>>>>> + case SCM_TSTAMP_SCHED:
> >>>>>>>
> >>>>>>> Instead of doing this translation,
> >>>>>>> is it easier to directly store the bpf prog desired ts"type" (i.e. the
> >>>>>>> SCM_TSTAMP_*) in the sk->sk_tsflags_bpf?
> >>>>>>> or there is a specific need to keep the SOF_TIMESTAMPING_* value in
> >>>>>>> sk->sk_tsflags_bpf?
> >>>>>>
> >>>>>> We have to reuse SOF_TIMESTAMPING_* because there are more flags, say,
> >>>>>> SOF_TIMESTAMPING_OPT_ID, that we need to support.
> >>>>>>
> >>>>>>>
> >>>>>>>> + testflag = SOF_TIMESTAMPING_TX_SCHED;
> >>>>>>>> + break;
> >>>>>>>> + case SCM_TSTAMP_SND:
> >>>>>>>> + testflag = SOF_TIMESTAMPING_TX_SOFTWARE;
> >>>>>>>> + break;
> >>>>>>>> + case SCM_TSTAMP_ACK:
> >>>>>>>> + testflag = SOF_TIMESTAMPING_TX_ACK;
> >>>>>>>> + break;
> >>>>>>>> + default:
> >>>>>>>> + return false;
> >>>>>>>> + }
> >>>>>>>> + if (tsflags & testflag)
> >>>>>>>> + return true;
> >>>>>>>> +
> >>>>>>>> + return false;
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>> static void skb_tstamp_tx_output(struct sk_buff *orig_skb,
> >>>>>>>> const struct sk_buff *ack_skb,
> >>>>>>>> struct skb_shared_hwtstamps *hwtstamps,
> >>>>>>>> @@ -5558,6 +5581,9 @@ static void skb_tstamp_tx_output(struct sk_buff *orig_skb,
> >>>>>>>> if (!skb_may_tx_timestamp(sk, tsonly))
> >>>>>>>> return;
> >>>>>>>>
> >>>>>>>> + if (!sk_tstamp_tx_flags(sk, tsflags, tstype))
> >>>>>>>
> >>>>>>> This is a new test. tsflags is the sk->sk_tsflags here if I read it correctly.
> >>>>>>
> >>>>>> This test will be used in bpf and non-bpf cases. Because of this, we
> >>>>>> can support BPF extension. In this function, if skb has tsflags but we
> >>>>>> don't know which approach the user expects, sk_tstamp_tx_flags() can
> >>>>>> help us.
> >>>>>>
> >>>>>>>
> >>>>>>> My understanding is the sendmsg can provide SOF_TIMESTAMPING_* for individual
> >>>>>>> skb. Would it break?
> >>>>>>
> >>>>>> Oh, you're right. I didn't support cmsg mode...
> >>>>>
> >>>>> I think I only need to test if it's in the bpf mode, or else let the
> >>>>> original way print the timestamp, which can solve the issue.
> >>>>
> >>>> From looking at the existing "__skb_tstamp_tx(skb, NULL, NULL, skb->sk,
> >>>> SCM_TSTAMP_SCHED);":
> >>>>
> >>>> int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
> >>>> {
> >>>> /* ... */
> >>>>
> >>>> if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_SCHED_TSTAMP))
> >>>> __skb_tstamp_tx(skb, NULL, NULL, skb->sk, SCM_TSTAMP_SCHED);
> >>>>
> >>>> /* ... */
> >>>> }
> >>>>
> >>>> I am still puzzling how __skb_tstamp_tx() will be called if only bpf has enabled
> >>>> the timestamping. I may have missed somewhere in the patch set that the skb's
> >>>> tx_flags is changed by sk->sk_tsflags_bpf alone?
> >>>
> >>> If sk_tsflags_bpf is set, tcp_sendmsg() -> tcp_tx_timestamp() will be
> >>> helpful, which initializes every last skb, please see patch [10/12].
> >>
> >> Ah. ok. It is the thing I missed. Thanks for the pointer.
> >>
> >>>>
> >>>> I think a skb tskey is still desired (?), so eventually we want some spaces in
> >>>
> >>> tskey function is optional I think. It depends whether users want to
> >>> use it or not. It can controlled by SOF_TIMESTAMPING_OPT_ID flag.
> >>>
> >>>> the skb for bpf. Jakub Sitnicki (cc-ed) has presented in LPC about extending
> >>>> skb->data_meta usage outside of xdp and tc. I think here we want to have it
> >>>> available at the tx side to store the tx_flags and tskey but probably want them
> >>>> at a specific place/offset at the data_meta.
> >>>
> >>> If we have the plan to store extra information in data_meta, I can
> >>> give it a try:)
> >>>
> >>>>
> >>>> For now, is there thing we can explore to share in the skb_shared_info?
> >>>
> >>> My initial thought is just to reuse these fields in skb. It can work
> >>> without interfering one another.
> >>
> >> After reading closer to patch 10, I am likely still missing something. How can
> >> it tell if the tx_flags is set by the bpf or by the user space cmsg?
> >
> > If the skb carries the timestamp, there are three cases:
> > 1) non-bpf case and users uses setsockopt()
> > 2) cmsg case
> > 3) bpf case
> >
> > #1 and #2 are already handled well before this patch. I only need to
> > test if sk_tsflags_bpf has those flags. If so, it means we hit #3, or
> > else it could be #1 or #2, then we will let the old way print
> > timestamps in __skb_tstamp_tx().
>
> hmm... I am still not sure I fully understand...but I think I may start getting it.
Sorry, my bad. I gave the wrong answer...
It should be:
Testing if if sk_tsflags has SOF_TIMESTAMPING_SOFTWARE flag should
work fine. If it has the flag, we could use skb_tstamp_tx_output() to
print based on patch [4/12]; if not, we will use
bpf_skb_tstamp_tx_output() to print.
If users use traditional ways of deploying SO_TIMESTAMPING, sk_tsflags
always has SOF_TIMESTAMPING_SOFTWARE which is a software report flag
(please see Documentation/networking/timestamping.rst). We can see a
good example on how to use in
tools/testing/selftests/net/txtimestamp.c:
do_test()
{
sock_opt = SOF_TIMESTAMPING_SOFTWARE |
...
if (setsockopt(fd, SOL_SOCKET, SO_TIMESTAMPING,
(char *) &sock_opt, sizeof(sock_opt)))
}
>
> Is it the reason that the bpf_setsockopt() cannot clear the sk_tsflags_bpf once
> it is set in patch 2? It is not a usable api tbh. It will be a surprise to many.
> It has to be able to set and clear.
I cannot find a good time to clear all the sockets which are set
through the BPF program. If we detach the BPF program, it will not
print of course. Does it really matter if we don't clear the
sk_tsflags_bpf?
>
> Does it also mean either the bpf or the user space can enable the timetstamping
> but not both? I don't think we can assume this also. It will be hard to deploy
> the bpf prog in production to collect continuous data. The user space may have
> some timestamping enabled but the bpf may want to do its parallel investigation
> also. The user space may rollout timestamping in the future and suddenly break
> the bpf prog.
Well, IIUC, it's also the basic idea from the current series which
allows both happening at the same time. Let us put it in a simple way,
I hope that if the app uses the SO_TIMESTAMPING feature already, then
one admin deploys the BPF program, that app should be traced both in
bpf and non-bpf ways.
But Willem doesn't agree about this approach[1] because of hard to debug.
[1]: https://lore.kernel.org/all/670dda9437147_2e6c4029461@willemb.c.googlers.com.notmuch/
Regarding to this link, I have a few more words to say: the socket
could be set through bpf_setsockopt() in different phases not like
setsockopt(), so in some cases we cannot make setsockopt hard failed.
After rethinking this point more, I still reckon that letting BPF
program trace timestamping parallelly without caring whether the
socket is set to the SO_TIMESTAMPING feature through setsockopt()
method. It means I would like to keep this part in patch [04/12]:
@@ -5601,6 +5636,9 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb,
if (!sk)
return;
+ if (static_branch_unlikely(&bpf_tstamp_control))
+ bpf_skb_tstamp_tx_output(sk, tstype);
+
skb_tstamp_tx_output(orig_skb, ack_skb, hwtstamps, sk,
tstype);
}
EXPORT_SYMBOL_GPL(__skb_tstamp_tx);
>
> [ getting late here. will continue later. ]
Thanks for your effort :)
Thanks,
Jason
Powered by blists - more mailing lists