[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <447dca19-58c5-4c01-b60e-cfe5e601961a@daynix.com>
Date: Sun, 29 Sep 2024 16:10:47 +0900
From: Akihiko Odaki <akihiko.odaki@...nix.com>
To: Jason Wang <jasowang@...hat.com>
Cc: Jonathan Corbet <corbet@....net>,
Willem de Bruijn <willemdebruijn.kernel@...il.com>,
"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
"Michael S. Tsirkin" <mst@...hat.com>, Xuan Zhuo
<xuanzhuo@...ux.alibaba.com>, Shuah Khan <shuah@...nel.org>,
linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
netdev@...r.kernel.org, kvm@...r.kernel.org,
virtualization@...ts.linux-foundation.org, linux-kselftest@...r.kernel.org,
Yuri Benditovich <yuri.benditovich@...nix.com>,
Andrew Melnychenko <andrew@...nix.com>,
Stephen Hemminger <stephen@...workplumber.org>, gur.stavi@...wei.com
Subject: Re: [PATCH RFC v4 0/9] tun: Introduce virtio-net hashing feature
On 2024/09/29 11:07, Jason Wang wrote:
> On Fri, Sep 27, 2024 at 3:51 PM Akihiko Odaki <akihiko.odaki@...nix.com> wrote:
>>
>> On 2024/09/27 13:31, Jason Wang wrote:
>>> On Fri, Sep 27, 2024 at 10:11 AM Akihiko Odaki <akihiko.odaki@...nix.com> wrote:
>>>>
>>>> On 2024/09/25 12:30, Jason Wang wrote:
>>>>> On Tue, Sep 24, 2024 at 5:01 PM Akihiko Odaki <akihiko.odaki@...nix.com> wrote:
>>>>>>
>>>>>> virtio-net have two usage of hashes: one is RSS and another is hash
>>>>>> reporting. Conventionally the hash calculation was done by the VMM.
>>>>>> However, computing the hash after the queue was chosen defeats the
>>>>>> purpose of RSS.
>>>>>>
>>>>>> Another approach is to use eBPF steering program. This approach has
>>>>>> another downside: it cannot report the calculated hash due to the
>>>>>> restrictive nature of eBPF.
>>>>>>
>>>>>> Introduce the code to compute hashes to the kernel in order to overcome
>>>>>> thse challenges.
>>>>>>
>>>>>> An alternative solution is to extend the eBPF steering program so that it
>>>>>> will be able to report to the userspace, but it is based on context
>>>>>> rewrites, which is in feature freeze. We can adopt kfuncs, but they will
>>>>>> not be UAPIs. We opt to ioctl to align with other relevant UAPIs (KVM
>>>>>> and vhost_net).
>>>>>>
>>>>>
>>>>> I wonder if we could clone the skb and reuse some to store the hash,
>>>>> then the steering eBPF program can access these fields without
>>>>> introducing full RSS in the kernel?
>>>>
>>>> I don't get how cloning the skb can solve the issue.
>>>>
>>>> We can certainly implement Toeplitz function in the kernel or even with
>>>> tc-bpf to store a hash value that can be used for eBPF steering program
>>>> and virtio hash reporting. However we don't have a means of storing a
>>>> hash type, which is specific to virtio hash reporting and lacks a
>>>> corresponding skb field.
>>>
>>> I may miss something but looking at sk_filter_is_valid_access(). It
>>> looks to me we can make use of skb->cb[0..4]?
>>
>> I didn't opt to using cb. Below is the rationale:
>>
>> cb is for tail call so it means we reuse the field for a different
>> purpose. The context rewrite allows adding a field without increasing
>> the size of the underlying storage (the real sk_buff) so we should add a
>> new field instead of reusing an existing field to avoid confusion.
>>
>> We are however no longer allowed to add a new field. In my
>> understanding, this is because it is an UAPI, and eBPF maintainers found
>> it is difficult to maintain its stability.
>>
>> Reusing cb for hash reporting is a workaround to avoid having a new
>> field, but it does not solve the underlying problem (i.e., keeping eBPF
>> as stable as UAPI is unreasonably hard). In my opinion, adding an ioctl
>> is a reasonable option to keep the API as stable as other virtualization
>> UAPIs while respecting the underlying intention of the context rewrite
>> feature freeze.
>
> Fair enough.
>
> Btw, I remember DPDK implements tuntap RSS via eBPF as well (probably
> via cls or other). It might worth to see if anything we miss here.
Thanks for the information. I wonder why they used cls instead of
steering program. Perhaps it may be due to compatibility with macvtap
and ipvtap, which don't steering program.
Their RSS implementation looks cleaner so I will improve my RSS
implementation accordingly.
Regards,
Akihiko Odaki
Powered by blists - more mailing lists