lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAL+tcoBzces5_awOzZsyqpTWjk0moxkjj7kHjCtPcsU3kNJ4tg@mail.gmail.com>
Date: Wed, 6 Nov 2024 10:51:52 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: Martin KaFai Lau <martin.lau@...ux.dev>
Cc: Willem de Bruijn <willemdebruijn.kernel@...il.com>, willemb@...gle.com, 
	davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org, pabeni@...hat.com, 
	dsahern@...nel.org, ast@...nel.org, daniel@...earbox.net, andrii@...nel.org, 
	eddyz87@...il.com, song@...nel.org, yonghong.song@...ux.dev, 
	john.fastabend@...il.com, kpsingh@...nel.org, sdf@...ichev.me, 
	haoluo@...gle.com, jolsa@...nel.org, shuah@...nel.org, ykolal@...com, 
	bpf@...r.kernel.org, netdev@...r.kernel.org, 
	Jason Xing <kernelxing@...cent.com>
Subject: Re: [PATCH net-next v3 02/14] net-timestamp: allow two features to
 work parallelly

On Wed, Nov 6, 2024 at 9:09 AM Martin KaFai Lau <martin.lau@...ux.dev> wrote:
>
> On 11/5/24 4:17 PM, Jason Xing wrote:
> > On Wed, Nov 6, 2024 at 3:22 AM Martin KaFai Lau <martin.lau@...ux.dev> wrote:
> >>
> >> On 11/4/24 10:22 PM, Jason Xing wrote:
> >>> On Tue, Nov 5, 2024 at 10:09 AM Martin KaFai Lau <martin.lau@...ux.dev> wrote:
> >>>>
> >>>> On 11/1/24 6:32 AM, Willem de Bruijn wrote:
> >>>>>> In udp/raw/..., I don't know how likely is the user space having "cork->tx_flags
> >>>>>> & SKBTX_ANY_TSTAMP" set but has neither "READ_ONCE(sk->sk_tsflags) &
> >>>>>> SOF_TIMESTAMPING_OPT_ID" nor "cork->flags & IPCORK_TS_OPT_ID" set.
> >>>>> This is not something to rely on. OPT_ID was added relatively recently.
> >>>>> Older applications, or any that just use the most straightforward API,
> >>>>> will not set this.
> >>>>
> >>>> Good point that the OPT_ID per cmsg is very new.
> >>>>
> >>>> The datagram support on SOF_TIMESTAMPING_OPT_ID in sk->sk_tsflags had
> >>>> been there for quite some time now. Is it a safe assumption that
> >>>> most applications doing udp tx timestamping should have
> >>>> the SOF_TIMESTAMPING_OPT_ID set to be useful?
> >>>>
> >>>>>
> >>>>>> If it is
> >>>>>> unlikely, may be we can just disallow bpf prog from directly setting
> >>>>>> skb_shinfo(skb)->tskey for this particular skb.
> >>>>>>
> >>>>>> For all other cases, in __ip[6]_append_data, directly call a bpf prog and also
> >>>>>> pass the kernel decided tskey to the bpf prog.
> >>>>>>
> >>>>>> The kernel passed tskey could be 0 (meaning the user space has not used it). The
> >>>>>> bpf prog can give one for the kernel to use. The bpf prog can store the
> >>>>>> sk_tskey_bpf in the bpf_sk_storage now. Meaning no need to add one to the struct
> >>>>>> sock. The bpf prog does not have to start from 0 (e.g. start from U32_MAX
> >>>>>> instead) if it helps.
> >>>>>>
> >>>>>> If the kernel passed tskey is not 0, the bpf prog can just use that one
> >>>>>> (assuming the user space is doing something sane, like the value in
> >>>>>> SCM_TS_OPT_ID won't be jumping back and front between 0 to U32_MAX). I hope this
> >>>>>> is very unlikely also (?) but the bpf prog can probably detect this and choose
> >>>>>> to ignore this sk.
> >>>>> If an applications uses OPT_ID, it is unlikely that they will toggle
> >>>>> the feature on and off on a per-packet basis. So in the common case
> >>>>> the program could use the user-set counter or use its own if userspace
> >>>>> does not enable the feature. In the rare case that an application does
> >>>>> intermittently set an OPT_ID, the numbering would be erratic. This
> >>>>> does mean that an actively malicious application could mess with admin
> >>>>> measurements.
> >>>>
> >>>> All make sense. Given it is reasonable to assume the user space should either
> >>>> has SOF_TIMESTAMPING_OPT_ID always on or always off. When it is off, the bpf
> >>>> prog can directly provide its own tskey to be used in shinfo->tskey. The bpf
> >>>> prog can generate the id itself without using the sk->sk_tskey, e.g. store an
> >>>> atomic int in the bpf_sk_storage.
> >>>
> >>> I wonder, how can we correlate the key with each skb in the bpf
> >>> program for non-TCP type without implementing a bpf extension for
> >>> SCM_TS_OPT_ID? Every time the timestamp is reported, we cannot know
> >>> which sendmsg() the skb belongs to for non-TCP cases.
> >>
> >> SCM_TS_OPT_ID is eventually setting the shinfo->tskey.
> >> If the shinfo->tskey is not set by the user space, the bpf prog can directly set
> >> the shinfo->tskey. There is no need to use the sk->sk_tskey as the ID generator
> >> also. The bpf prog can have its own id generator.
> >>
> >> If the user space has already set the shinfo->tskey (either by sk->sk_tskey or
> >> SCM_TS_OPT_ID), the bpf prog can just use the user space one.
> >>
> >> If there is a weird application that flips flops between OPT_ID on/off, the bpf
> >> prog will get confused which is fine. The bpf prog can detect this and choose to
> >> ignore measuring this sk/skb. The bpf prog can also choose to be on the very
> >> safe side and ignore all skb with SKBTX_ANY_TSTAMP set in txflags but with no
> >> OPT_ID. The bpf prog can look into the details of the sk and skb to decide what
> >> makes the most sense for its deployment.
> >>
> >> I don't know whether it makes more sense to call the bpf prog to decide the
> >> shinfo->{tx_flags,tskey} just before the "while (length > 0)" in
> >> __ip[6]_append_data or it is better to call the bpf prog in ip[6]_setup_cork.
> >> I admittedly less familiar with this code path than the tcp one.
> >
> > Now I feel it could be complicated for a software engineer to consider
> > how they will handle the key if they don't read the kernel code very
> > carefully. They are facing different situations. Being user-friendly
> > lets this feature have more chances to get widely used. As I insisted
> > before, I still would like to know if it is possible that we can try
> > to introduce sk_tskey_bpf_offset (like patch 10-12) to calculate a bpf
> > exclusive tskey for bpf use? Only exporting one key. It will be really
> > simple and easy-to-use :)
>
> imo, there is no need for adding sk_tskey_bpf_offset to sk. just allow the bpf
> prog to decide what is the tskey.
>
> There is no usability issue in bpf prog. It is pretty normal for a bpf prog
> author to look at the sk details to make decision.
>
> Abstracting the sk/skb is not helping the bpf prog and not the right direction
> to go. Over time, there has been case over case that the bpf prog wants to know
> more instead of being abstracted away like running in the user space. e.g. The
> "struct bpf_sock" abstraction in the uapi/linux/bpf.h does not scale and we have
> stopped adding more abstraction this way. The btf (and PTR_TO_BTF_ID,
> CO-RE...etc) has been added to allow the bpf prog to learn other details in sk
> and skb.
>
> Instead, design a better bpf kfunc to help the bpf prog to set the bits/tskey in
> the skb. I think this is more important. tcp tskey is easy. just need some care
> on the udp tskey and need to check if the user space has already set one.
> A good designed bpf kfunc is all it needs.

Thanks!

Let me confirm again in case I'm missing something important.
1) For tcp, as you said before, bpf prog can extract the seq from the
exported skb, so I don't need to export any key in this case.
2) For udp, if the skb has skb_shinfo(skb)->tskey set, then export the
key, else, export zero to the bpf program.
3) extend SCM_TS_OPT_ID for the udp/bpf case.

I'm not sure if I should postpone implementing this part after the
basic framework of this series gets merged. Anyway, I will try this :)

Thanks,
Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ