[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPhsuW5JYoM-Mkehdy=FQsG1nvjbYGzwRZx8BkpG1P7cHdD=eQ@mail.gmail.com>
Date: Sat, 18 Nov 2023 08:08:22 -0800
From: Song Liu <song@...nel.org>
To: Akihiko Odaki <akihiko.odaki@...nix.com>
Cc: Alexei Starovoitov <alexei.starovoitov@...il.com>, Jason Wang <jasowang@...hat.com>,
Alexei Starovoitov <ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>,
Andrii Nakryiko <andrii@...nel.org>, Martin KaFai Lau <martin.lau@...ux.dev>,
Yonghong Song <yonghong.song@...ux.dev>, John Fastabend <john.fastabend@...il.com>,
KP Singh <kpsingh@...nel.org>, Stanislav Fomichev <sdf@...gle.com>, Hao Luo <haoluo@...gle.com>,
Jiri Olsa <jolsa@...nel.org>, Jonathan Corbet <corbet@....net>,
Willem de Bruijn <willemdebruijn.kernel@...il.com>, "David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
"Michael S. Tsirkin" <mst@...hat.com>, Xuan Zhuo <xuanzhuo@...ux.alibaba.com>,
Mykola Lysenko <mykolal@...com>, Shuah Khan <shuah@...nel.org>, bpf <bpf@...r.kernel.org>,
"open list:DOCUMENTATION" <linux-doc@...r.kernel.org>, LKML <linux-kernel@...r.kernel.org>,
Network Development <netdev@...r.kernel.org>, kvm@...r.kernel.org,
virtualization@...ts.linux-foundation.org,
"open list:KERNEL SELFTEST FRAMEWORK" <linux-kselftest@...r.kernel.org>,
Yuri Benditovich <yuri.benditovich@...nix.com>, Andrew Melnychenko <andrew@...nix.com>
Subject: Re: [RFC PATCH v2 1/7] bpf: Introduce BPF_PROG_TYPE_VNET_HASH
Hi,
A few rookie questions below.
On Sat, Nov 18, 2023 at 2:39 AM Akihiko Odaki <akihiko.odaki@...nix.com> wrote:
>
> On 2023/10/18 4:19, Akihiko Odaki wrote:
> > On 2023/10/18 4:03, Alexei Starovoitov wrote:
[...]
> >
> > I would also appreciate if you have some documentation or link to
> > relevant discussions on the mailing list. That will avoid having same
> > discussion you may already have done in the past.
>
> Hi,
>
> The discussion has been stuck for a month, but I'd still like to
> continue figuring out the way best for the whole kernel to implement
> this feature. I summarize the current situation and question that needs
> to be answered before push this forward:
>
> The goal of this RFC is to allow to report hash values calculated with
> eBPF steering program. It's essentially just to report 4 bytes from the
> kernel to the userspace.
AFAICT, the proposed design is to have BPF generate some data
(namely hash, but could be anything afaict) and consume it from
user space. Instead of updating __sk_buff, can we have the user
space to fetch the data/hash from a bpf map? If this is an option,
I guess we can implement the same feature with BPF tracing
programs?
>
> Unfortunately, however, it is not acceptable for the BPF subsystem
> because the "stable" BPF is completely fixed these days. The
> "unstable/kfunc" BPF is an alternative, but the eBPF program will be
> shipped with a portable userspace program (QEMU)[1] so the lack of
> interface stability is not tolerable.
bpf kfuncs are as stable as exported symbols. Is exported symbols
like stability enough for the use case? (I would assume yes.)
>
> Another option is to hardcode the algorithm that was conventionally
> implemented with eBPF steering program in the kernel[2]. It is possible
> because the algorithm strictly follows the virtio-net specification[3].
> However, there are proposals to add different algorithms to the
> specification[4], and hardcoding the algorithm to the kernel will
> require to add more UAPIs and code each time such a specification change
> happens, which is not good for tuntap.
The requirement looks similar to hid-bpf. Could you explain why that
model is not enough? HID also requires some stability AFAICT.
Thanks,
Song
>
> In short, the proposed feature requires to make either of three compromises:
>
> 1. Compromise on the BPF side: Relax the "stable" BPF feature freeze
> once and allow eBPF steering program to report 4 more bytes to the kernel.
>
> 2. Compromise on the tuntap side: Implement the algorithm to the kernel,
> and abandon the capability to update the algorithm without changing the
> kernel.
>
> IMHO, I think it's better to make a compromise on the BPF side (option
> 1). We should minimize the total UAPI changes in the whole kernel, and
> option 1 is much superior in that sense.
>
> Yet I have to note that such a compromise on the BPF side can risk the
> "stable" BPF feature freeze fragile and let other people complain like
> "you allowed to change stable BPF for this, why do you reject [some
> other request to change stable BPF]?" It is bad for BPF maintainers. (I
> can imagine that introducing and maintaining widely different BPF
> interfaces is too much burden.) And, of course, this requires an
> approval from BPF maintainers.
>
> So I'd like to ask you that which of these compromises you think worse.
> Please also tell me if you have another idea.
>
> Regards,
> Akihiko Odaki
>
> [1] https://qemu.readthedocs.io/en/v8.1.0/devel/ebpf_rss.html
> [2]
> https://lore.kernel.org/all/20231008052101.144422-1-akihiko.odaki@daynix.com/
> [3]
> https://docs.oasis-open.org/virtio/virtio/v1.2/csd01/virtio-v1.2-csd01.html#x1-2400003
> [4]
> https://lore.kernel.org/all/CACGkMEuBbGKssxNv5AfpaPpWQfk2BHR83rM5AHXN-YVMf2NvpQ@mail.gmail.com/
Powered by blists - more mailing lists