[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAL+tcoAPYSkPW27nxr9_Xbc3oGToMZ=D=jyTr=x=XEWa8mKq3A@mail.gmail.com>
Date: Wed, 6 Nov 2024 10:37:52 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc: Martin KaFai Lau <martin.lau@...ux.dev>, willemb@...gle.com, davem@...emloft.net,
edumazet@...gle.com, kuba@...nel.org, pabeni@...hat.com, dsahern@...nel.org,
ast@...nel.org, daniel@...earbox.net, andrii@...nel.org, eddyz87@...il.com,
song@...nel.org, yonghong.song@...ux.dev, john.fastabend@...il.com,
kpsingh@...nel.org, sdf@...ichev.me, haoluo@...gle.com, jolsa@...nel.org,
shuah@...nel.org, ykolal@...com, bpf@...r.kernel.org, netdev@...r.kernel.org,
Jason Xing <kernelxing@...cent.com>
Subject: Re: [PATCH net-next v3 02/14] net-timestamp: allow two features to
work parallelly
On Wed, Nov 6, 2024 at 9:11 AM Willem de Bruijn
<willemdebruijn.kernel@...il.com> wrote:
>
> Jason Xing wrote:
> > On Wed, Nov 6, 2024 at 3:22 AM Martin KaFai Lau <martin.lau@...ux.dev> wrote:
> > >
> > > On 11/4/24 10:22 PM, Jason Xing wrote:
> > > > On Tue, Nov 5, 2024 at 10:09 AM Martin KaFai Lau <martin.lau@...ux.dev> wrote:
> > > >>
> > > >> On 11/1/24 6:32 AM, Willem de Bruijn wrote:
> > > >>>> In udp/raw/..., I don't know how likely is the user space having "cork->tx_flags
> > > >>>> & SKBTX_ANY_TSTAMP" set but has neither "READ_ONCE(sk->sk_tsflags) &
> > > >>>> SOF_TIMESTAMPING_OPT_ID" nor "cork->flags & IPCORK_TS_OPT_ID" set.
> > > >>> This is not something to rely on. OPT_ID was added relatively recently.
> > > >>> Older applications, or any that just use the most straightforward API,
> > > >>> will not set this.
> > > >>
> > > >> Good point that the OPT_ID per cmsg is very new.
> > > >>
> > > >> The datagram support on SOF_TIMESTAMPING_OPT_ID in sk->sk_tsflags had
> > > >> been there for quite some time now. Is it a safe assumption that
> > > >> most applications doing udp tx timestamping should have
> > > >> the SOF_TIMESTAMPING_OPT_ID set to be useful?
> > > >>
> > > >>>
> > > >>>> If it is
> > > >>>> unlikely, may be we can just disallow bpf prog from directly setting
> > > >>>> skb_shinfo(skb)->tskey for this particular skb.
> > > >>>>
> > > >>>> For all other cases, in __ip[6]_append_data, directly call a bpf prog and also
> > > >>>> pass the kernel decided tskey to the bpf prog.
> > > >>>>
> > > >>>> The kernel passed tskey could be 0 (meaning the user space has not used it). The
> > > >>>> bpf prog can give one for the kernel to use. The bpf prog can store the
> > > >>>> sk_tskey_bpf in the bpf_sk_storage now. Meaning no need to add one to the struct
> > > >>>> sock. The bpf prog does not have to start from 0 (e.g. start from U32_MAX
> > > >>>> instead) if it helps.
> > > >>>>
> > > >>>> If the kernel passed tskey is not 0, the bpf prog can just use that one
> > > >>>> (assuming the user space is doing something sane, like the value in
> > > >>>> SCM_TS_OPT_ID won't be jumping back and front between 0 to U32_MAX). I hope this
> > > >>>> is very unlikely also (?) but the bpf prog can probably detect this and choose
> > > >>>> to ignore this sk.
> > > >>> If an applications uses OPT_ID, it is unlikely that they will toggle
> > > >>> the feature on and off on a per-packet basis. So in the common case
> > > >>> the program could use the user-set counter or use its own if userspace
> > > >>> does not enable the feature. In the rare case that an application does
> > > >>> intermittently set an OPT_ID, the numbering would be erratic. This
> > > >>> does mean that an actively malicious application could mess with admin
> > > >>> measurements.
> > > >>
> > > >> All make sense. Given it is reasonable to assume the user space should either
> > > >> has SOF_TIMESTAMPING_OPT_ID always on or always off. When it is off, the bpf
> > > >> prog can directly provide its own tskey to be used in shinfo->tskey. The bpf
> > > >> prog can generate the id itself without using the sk->sk_tskey, e.g. store an
> > > >> atomic int in the bpf_sk_storage.
> > > >
> > > > I wonder, how can we correlate the key with each skb in the bpf
> > > > program for non-TCP type without implementing a bpf extension for
> > > > SCM_TS_OPT_ID? Every time the timestamp is reported, we cannot know
> > > > which sendmsg() the skb belongs to for non-TCP cases.
> > >
> > > SCM_TS_OPT_ID is eventually setting the shinfo->tskey.
> > > If the shinfo->tskey is not set by the user space, the bpf prog can directly set
> > > the shinfo->tskey. There is no need to use the sk->sk_tskey as the ID generator
> > > also. The bpf prog can have its own id generator.
> > >
> > > If the user space has already set the shinfo->tskey (either by sk->sk_tskey or
> > > SCM_TS_OPT_ID), the bpf prog can just use the user space one.
> > >
> > > If there is a weird application that flips flops between OPT_ID on/off, the bpf
> > > prog will get confused which is fine. The bpf prog can detect this and choose to
> > > ignore measuring this sk/skb.
>
> That will skew measurement and is under control of the process.
>
> I don't immediately foresee this being used to measure untrusted
> processes that would have an incentive to game this.
>
> But the caveat should be stated explicitly.
>
> > > The bpf prog can also choose to be on the very
> > > safe side and ignore all skb with SKBTX_ANY_TSTAMP set in txflags but with no
> > > OPT_ID. The bpf prog can look into the details of the sk and skb to decide what
> > > makes the most sense for its deployment.
> > >
> > > I don't know whether it makes more sense to call the bpf prog to decide the
> > > shinfo->{tx_flags,tskey} just before the "while (length > 0)" in
> > > __ip[6]_append_data or it is better to call the bpf prog in ip[6]_setup_cork.
> > > I admittedly less familiar with this code path than the tcp one.
>
> Probably the current spot, mainly because no skb exists yet in
> ip_setup_cork.
>
> > Now I feel it could be complicated for a software engineer to consider
> > how they will handle the key if they don't read the kernel code very
> > carefully. They are facing different situations. Being user-friendly
> > lets this feature have more chances to get widely used. As I insisted
> > before, I still would like to know if it is possible that we can try
> > to introduce sk_tskey_bpf_offset (like patch 10-12) to calculate a bpf
> > exclusive tskey for bpf use? Only exporting one key. It will be really
> > simple and easy-to-use :)
>
> That has complications of its own. It also has to deal with the user
> enabling/disabling/resetting its key, and with OPT_ID passed by cmsg.
> Multiple skbs may be in flight, derived from each of these sources.
> A single sk flag can only offset against one of them.
>
> I think Martin's approach is more workable. Use the tskey that is set,
> if any. Else, set one.
Got it. Thanks!
Powered by blists - more mailing lists