lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACGkMEtscr17UOufUtyPp1OvALL8LcycpbRp6CyVMF=jYzAjAA@mail.gmail.com>
Date: Sun, 29 Sep 2024 10:07:11 +0800
From: Jason Wang <jasowang@...hat.com>
To: Akihiko Odaki <akihiko.odaki@...nix.com>
Cc: Jonathan Corbet <corbet@....net>, Willem de Bruijn <willemdebruijn.kernel@...il.com>, 
	"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>, 
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, 
	"Michael S. Tsirkin" <mst@...hat.com>, Xuan Zhuo <xuanzhuo@...ux.alibaba.com>, 
	Shuah Khan <shuah@...nel.org>, linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org, 
	netdev@...r.kernel.org, kvm@...r.kernel.org, 
	virtualization@...ts.linux-foundation.org, linux-kselftest@...r.kernel.org, 
	Yuri Benditovich <yuri.benditovich@...nix.com>, Andrew Melnychenko <andrew@...nix.com>, 
	Stephen Hemminger <stephen@...workplumber.org>, gur.stavi@...wei.com
Subject: Re: [PATCH RFC v4 0/9] tun: Introduce virtio-net hashing feature

On Fri, Sep 27, 2024 at 3:51 PM Akihiko Odaki <akihiko.odaki@...nix.com> wrote:
>
> On 2024/09/27 13:31, Jason Wang wrote:
> > On Fri, Sep 27, 2024 at 10:11 AM Akihiko Odaki <akihiko.odaki@...nix.com> wrote:
> >>
> >> On 2024/09/25 12:30, Jason Wang wrote:
> >>> On Tue, Sep 24, 2024 at 5:01 PM Akihiko Odaki <akihiko.odaki@...nix.com> wrote:
> >>>>
> >>>> virtio-net have two usage of hashes: one is RSS and another is hash
> >>>> reporting. Conventionally the hash calculation was done by the VMM.
> >>>> However, computing the hash after the queue was chosen defeats the
> >>>> purpose of RSS.
> >>>>
> >>>> Another approach is to use eBPF steering program. This approach has
> >>>> another downside: it cannot report the calculated hash due to the
> >>>> restrictive nature of eBPF.
> >>>>
> >>>> Introduce the code to compute hashes to the kernel in order to overcome
> >>>> thse challenges.
> >>>>
> >>>> An alternative solution is to extend the eBPF steering program so that it
> >>>> will be able to report to the userspace, but it is based on context
> >>>> rewrites, which is in feature freeze. We can adopt kfuncs, but they will
> >>>> not be UAPIs. We opt to ioctl to align with other relevant UAPIs (KVM
> >>>> and vhost_net).
> >>>>
> >>>
> >>> I wonder if we could clone the skb and reuse some to store the hash,
> >>> then the steering eBPF program can access these fields without
> >>> introducing full RSS in the kernel?
> >>
> >> I don't get how cloning the skb can solve the issue.
> >>
> >> We can certainly implement Toeplitz function in the kernel or even with
> >> tc-bpf to store a hash value that can be used for eBPF steering program
> >> and virtio hash reporting. However we don't have a means of storing a
> >> hash type, which is specific to virtio hash reporting and lacks a
> >> corresponding skb field.
> >
> > I may miss something but looking at sk_filter_is_valid_access(). It
> > looks to me we can make use of skb->cb[0..4]?
>
> I didn't opt to using cb. Below is the rationale:
>
> cb is for tail call so it means we reuse the field for a different
> purpose. The context rewrite allows adding a field without increasing
> the size of the underlying storage (the real sk_buff) so we should add a
> new field instead of reusing an existing field to avoid confusion.
>
> We are however no longer allowed to add a new field. In my
> understanding, this is because it is an UAPI, and eBPF maintainers found
> it is difficult to maintain its stability.
>
> Reusing cb for hash reporting is a workaround to avoid having a new
> field, but it does not solve the underlying problem (i.e., keeping eBPF
> as stable as UAPI is unreasonably hard). In my opinion, adding an ioctl
> is a reasonable option to keep the API as stable as other virtualization
> UAPIs while respecting the underlying intention of the context rewrite
> feature freeze.

Fair enough.

Btw, I remember DPDK implements tuntap RSS via eBPF as well (probably
via cls or other). It might worth to see if anything we miss here.

Thanks

>
> Regards,
> Akihiko Odaki
>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ