[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAL+tcoBgbA1Q_7UaC0vp-mGHqDHxQ+eMybep0kw=E-T0oJAHfw@mail.gmail.com>
Date: Tue, 29 Oct 2024 10:41:10 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc: davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org,
pabeni@...hat.com, dsahern@...nel.org, willemb@...gle.com, ast@...nel.org,
daniel@...earbox.net, andrii@...nel.org, martin.lau@...ux.dev,
eddyz87@...il.com, song@...nel.org, yonghong.song@...ux.dev,
john.fastabend@...il.com, kpsingh@...nel.org, sdf@...ichev.me,
haoluo@...gle.com, jolsa@...nel.org, shuah@...nel.org, ykolal@...com,
bpf@...r.kernel.org, netdev@...r.kernel.org,
Jason Xing <kernelxing@...cent.com>
Subject: Re: [PATCH net-next v3 10/14] net-timestamp: add basic support with
tskey offset
On Tue, Oct 29, 2024 at 9:24 AM Willem de Bruijn
<willemdebruijn.kernel@...il.com> wrote:
>
> Jason Xing wrote:
> > From: Jason Xing <kernelxing@...cent.com>
> >
> > Use the offset to record the delta value between current socket key
> > and bpf socket key.
> >
> > 1. If there is only bpf feature running, the socket key is bpf socket
> > key and the offset is zero;
> > 2. If there is only traditional feature running, and then bpf feature
> > is turned on, the socket key is still used by the former while the offset
> > is the delta between them;
> > 3. if there is only bpf feature running, and then application uses it,
> > the socket key would be re-init for application and the offset is the
> > delta.
>
> We need to also figure out the rare conflict when one user sets
> OPT_ID | OPT_ID_TCP while the other only uses OPT_ID.
I think the current patch handles the case because:
1. sock_calculate_tskey_offset() gets the final key first whether the
OPT_ID_TCP is set or not.
2. we will use that tskey to calculate the delta.
>
> It is so obscure, that perhaps we can punt and say that the BPF
> program just has to follow the application preference and be aware of
> the subtle difference.
Right.
>
> > Signed-off-by: Jason Xing <kernelxing@...cent.com>
> > ---
> > include/net/sock.h | 1 +
> > net/core/skbuff.c | 15 ++++++++---
> > net/core/sock.c | 66 ++++++++++++++++++++++++++++++++++++++--------
> > 3 files changed, 68 insertions(+), 14 deletions(-)
> >
> > diff --git a/include/net/sock.h b/include/net/sock.h
> > index 91398b20a4a3..41c6c6f78e55 100644
> > --- a/include/net/sock.h
> > +++ b/include/net/sock.h
> > @@ -469,6 +469,7 @@ struct sock {
> > unsigned long sk_pacing_rate; /* bytes per second */
> > atomic_t sk_zckey;
> > atomic_t sk_tskey;
> > + u32 sk_tskey_bpf_offset;
> > __cacheline_group_end(sock_write_tx);
> >
> > __cacheline_group_begin(sock_read_tx);
> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > index 0b571306f7ea..d1739317b97d 100644
> > --- a/net/core/skbuff.c
> > +++ b/net/core/skbuff.c
> > @@ -5641,9 +5641,10 @@ void timestamp_call_bpf(struct sock *sk, int op, u32 nargs, u32 *args)
> > }
> >
> > static void skb_tstamp_tx_output_bpf(struct sock *sk, int tstype,
> > + struct sk_buff *skb,
> > struct skb_shared_hwtstamps *hwtstamps)
> > {
> > - u32 args[2] = {0, 0};
> > + u32 args[3] = {0, 0, 0};
> > u32 tsflags, cb_flag;
> >
> > tsflags = READ_ONCE(sk->sk_tsflags_bpf);
> > @@ -5672,7 +5673,15 @@ static void skb_tstamp_tx_output_bpf(struct sock *sk, int tstype,
> > args[1] = ts.tv_nsec;
> > }
> >
> > - timestamp_call_bpf(sk, cb_flag, 2, args);
> > + if (tsflags & SOF_TIMESTAMPING_OPT_ID) {
> > + args[2] = skb_shinfo(skb)->tskey;
> > + if (sk_is_tcp(sk))
> > + args[2] -= atomic_read(&sk->sk_tskey);
> > + if (sk->sk_tskey_bpf_offset)
> > + args[2] += sk->sk_tskey_bpf_offset;
> > + }
> > +
> > + timestamp_call_bpf(sk, cb_flag, 3, args);
>
>
> So the BPF interface is effectively OPT_TSONLY: the packet data is
> never shared.
>
> Then OPT_ID should be mandatory, because it without it the data is
> not actionable: which byte in the bytestream or packet in the case
> of datagram sockets does a callback refer to.
It does make sense, I think I will implement it when bpf_setsockopt() is called.
>
> > +/* Used to track the tskey for bpf extension
> > + *
> > + * @sk_tskey: bpf extension can use it only when no application uses.
> > + * Application can use it directly regardless of bpf extension.
> > + *
> > + * There are three strategies:
> > + * 1) If we've already set through setsockopt() and here we're going to set
> > + * OPT_ID for bpf use, we will not re-initialize the @sk_tskey and will
> > + * keep the record of delta between the current "key" and previous key.
> > + * 2) If we've already set through bpf_setsockopt() and here we're going to
> > + * set for application use, we will record the delta first and then
> > + * override/initialize the @sk_tskey.
> > + * 3) other cases, which means only either of them takes effect, so initialize
> > + * everything simplely.
> > + */
>
> Please explain in the commit message that these gymnastics are needed
> because there can only be one tskey in skb_shared_info.
No problem.
>
> > +static long int sock_calculate_tskey_offset(struct sock *sk, int val, int bpf_type)
> > +{
> > + u32 tskey;
> > +
> > + if (sk_is_tcp(sk)) {
> > + if ((1 << sk->sk_state) & (TCPF_CLOSE | TCPF_LISTEN))
> > + return -EINVAL;
> > +
> > + if (val & SOF_TIMESTAMPING_OPT_ID_TCP)
> > + tskey = tcp_sk(sk)->write_seq;
> > + else
> > + tskey = tcp_sk(sk)->snd_una;
> > + } else {
> > + tskey = 0;
> > + }
> > +
> > + if (bpf_type && (sk->sk_tsflags & SOF_TIMESTAMPING_OPT_ID)) {
> > + sk->sk_tskey_bpf_offset = tskey - atomic_read(&sk->sk_tskey);
> > + return 0;
> > + } else if (!bpf_type && (sk->sk_tsflags_bpf & SOF_TIMESTAMPING_OPT_ID)) {
> > + sk->sk_tskey_bpf_offset = atomic_read(&sk->sk_tskey) - tskey;
> > + } else {
> > + sk->sk_tskey_bpf_offset = 0;
> > + }
> > +
> > + return tskey;
> > +}
> > +
> > int sock_set_tskey(struct sock *sk, int val, int bpf_type)
> > {
> > u32 tsflags = bpf_type ? sk->sk_tsflags_bpf : sk->sk_tsflags;
> > @@ -901,17 +944,13 @@ int sock_set_tskey(struct sock *sk, int val, int bpf_type)
> >
> > if (val & SOF_TIMESTAMPING_OPT_ID &&
> > !(tsflags & SOF_TIMESTAMPING_OPT_ID)) {
> > - if (sk_is_tcp(sk)) {
> > - if ((1 << sk->sk_state) &
> > - (TCPF_CLOSE | TCPF_LISTEN))
> > - return -EINVAL;
> > - if (val & SOF_TIMESTAMPING_OPT_ID_TCP)
> > - atomic_set(&sk->sk_tskey, tcp_sk(sk)->write_seq);
> > - else
> > - atomic_set(&sk->sk_tskey, tcp_sk(sk)->snd_una);
> > - } else {
> > - atomic_set(&sk->sk_tskey, 0);
> > - }
> > + long int ret;
> > +
> > + ret = sock_calculate_tskey_offset(sk, val, bpf_type);
> > + if (ret <= 0)
> > + return ret;
> > +
> > + atomic_set(&sk->sk_tskey, ret);
> > }
> >
> > return 0;
> > @@ -956,10 +995,15 @@ static int sock_set_timestamping_bpf(struct sock *sk,
> > struct so_timestamping timestamping)
> > {
> > u32 flags = timestamping.flags;
> > + int ret;
> >
> > if (flags & ~SOF_TIMESTAMPING_BPF_SUPPPORTED_MASK)
> > return -EINVAL;
> >
> > + ret = sock_set_tskey(sk, flags, 1);
> > + if (ret)
> > + return ret;
> > +
> > WRITE_ONCE(sk->sk_tsflags_bpf, flags);
> >
> > return 0;
>
> I'm a bit hazy on when this can be called. We can assume that this new
> BPF operation cannot race with the existing setsockopt nor with the
> datapath that might touch the atomic fields, right?
It surely can race with the existing setsockopt.
1)
if (only existing setsockopt works) {
then sk->sk_tskey is set through setsockopt, sk_tskey_bpf_offset is 0.
}
2)
if (only bpf setsockopt works) {
then sk->sk_tskey is set through bpf_setsockopt,
sk_tskey_bpf_offset is 0.
}
3)
if (existing setsockopt already started, here we enable the bpf feature) {
then sk->sk_tskey will not change, but the sk_tskey_bpf_offset
will be calculated.
}
4)
if (bpf setsockopt already started, here we enable the application feature) {
then sk->sk_tskey will re-initialized/overridden by
setsockopt, and the sk_tskey_bpf_offset will be calculated.
}
Then the skb tskey will use the sk->sk_tskey like before.
At last, when we are about to print in bpf extension if we're allowed
(by testing the sk_tsflags_bpf), we only need to check if
sk_tskey_bpf_offset is zero or not. If the value is zero, it means
only the bpf program runs; if not, it means the sk->sk_tskey servers
for application feature, we need to compute the real bpf tskey. Please
see skb_tstamp_tx_output_bpf().
Above makes sure that two features can work parallelly. It's honestly
a little bit complicated. Before writing this part, I drew a few
pictures to help me understand how it works.
Thanks,
Jason
Powered by blists - more mailing lists