lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ff7916cf-8a9c-4c27-baaf-ca408817c063@daynix.com>
Date: Wed, 12 Mar 2025 14:03:34 +0900
From: Akihiko Odaki <akihiko.odaki@...nix.com>
To: Jason Wang <jasowang@...hat.com>
Cc: Jonathan Corbet <corbet@....net>,
 Willem de Bruijn <willemdebruijn.kernel@...il.com>,
 "David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
 Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
 "Michael S. Tsirkin" <mst@...hat.com>, Xuan Zhuo
 <xuanzhuo@...ux.alibaba.com>, Shuah Khan <shuah@...nel.org>,
 linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
 netdev@...r.kernel.org, kvm@...r.kernel.org,
 virtualization@...ts.linux-foundation.org, linux-kselftest@...r.kernel.org,
 Yuri Benditovich <yuri.benditovich@...nix.com>,
 Andrew Melnychenko <andrew@...nix.com>,
 Stephen Hemminger <stephen@...workplumber.org>, gur.stavi@...wei.com,
 Lei Yang <leiyang@...hat.com>, Simon Horman <horms@...nel.org>
Subject: Re: [PATCH net-next v9 3/6] tun: Introduce virtio-net hash feature

On 2025/03/12 11:35, Jason Wang wrote:
> On Tue, Mar 11, 2025 at 2:11 PM Akihiko Odaki <akihiko.odaki@...nix.com> wrote:
>>
>> On 2025/03/11 9:38, Jason Wang wrote:
>>> On Mon, Mar 10, 2025 at 3:45 PM Akihiko Odaki <akihiko.odaki@...nix.com> wrote:
>>>>
>>>> On 2025/03/10 12:55, Jason Wang wrote:
>>>>> On Fri, Mar 7, 2025 at 7:01 PM Akihiko Odaki <akihiko.odaki@...nix.com> wrote:
>>>>>>
>>>>>> Hash reporting
>>>>>> ==============
>>>>>>
>>>>>> Allow the guest to reuse the hash value to make receive steering
>>>>>> consistent between the host and guest, and to save hash computation.
>>>>>>
>>>>>> RSS
>>>>>> ===
>>>>>>
>>>>>> RSS is a receive steering algorithm that can be negotiated to use with
>>>>>> virtio_net. Conventionally the hash calculation was done by the VMM.
>>>>>> However, computing the hash after the queue was chosen defeats the
>>>>>> purpose of RSS.
>>>>>>
>>>>>> Another approach is to use eBPF steering program. This approach has
>>>>>> another downside: it cannot report the calculated hash due to the
>>>>>> restrictive nature of eBPF steering program.
>>>>>>
>>>>>> Introduce the code to perform RSS to the kernel in order to overcome
>>>>>> thse challenges. An alternative solution is to extend the eBPF steering
>>>>>> program so that it will be able to report to the userspace, but I didn't
>>>>>> opt for it because extending the current mechanism of eBPF steering
>>>>>> program as is because it relies on legacy context rewriting, and
>>>>>> introducing kfunc-based eBPF will result in non-UAPI dependency while
>>>>>> the other relevant virtualization APIs such as KVM and vhost_net are
>>>>>> UAPIs.
>>>>>>
>>>>>> Signed-off-by: Akihiko Odaki <akihiko.odaki@...nix.com>
>>>>>> Tested-by: Lei Yang <leiyang@...hat.com>
>>>>>> ---
>>>>>>     Documentation/networking/tuntap.rst |   7 ++
>>>>>>     drivers/net/Kconfig                 |   1 +
>>>>>>     drivers/net/tap.c                   |  68 ++++++++++++++-
>>>>>>     drivers/net/tun.c                   |  98 +++++++++++++++++-----
>>>>>>     drivers/net/tun_vnet.h              | 159 ++++++++++++++++++++++++++++++++++--
>>>>>>     include/linux/if_tap.h              |   2 +
>>>>>>     include/linux/skbuff.h              |   3 +
>>>>>>     include/uapi/linux/if_tun.h         |  75 +++++++++++++++++
>>>>>>     net/core/skbuff.c                   |   4 +
>>>>>>     9 files changed, 386 insertions(+), 31 deletions(-)
>>>>>>
>>>>>> diff --git a/Documentation/networking/tuntap.rst b/Documentation/networking/tuntap.rst
>>>>>> index 4d7087f727be5e37dfbf5066a9e9c872cc98898d..86b4ae8caa8ad062c1e558920be42ce0d4217465 100644
>>>>>> --- a/Documentation/networking/tuntap.rst
>>>>>> +++ b/Documentation/networking/tuntap.rst
>>>>>> @@ -206,6 +206,13 @@ enable is true we enable it, otherwise we disable it::
>>>>>>           return ioctl(fd, TUNSETQUEUE, (void *)&ifr);
>>>>>>       }
>>>>>>
> 
> [...]
> 
>>>>>> +static inline long tun_vnet_ioctl_sethash(struct tun_vnet_hash_container __rcu **hashp,
>>>>>> +                                         bool can_rss, void __user *argp)
>>>>>
>>>>> So again, can_rss seems to be tricky. Looking at its caller, it tires
>>>>> to make eBPF and RSS mutually exclusive. I still don't understand why
>>>>> we need this. Allow eBPF program to override some of the path seems to
>>>>> be common practice.
>>>>>
>>>>> What's more, we didn't try (or even can't) to make automq and eBPF to
>>>>> be mutually exclusive. So I still didn't see what we gain from this
>>>>> and it complicates the codes and may lead to ambiguous uAPI/behaviour.
>>>>
>>>> automq and eBPF are mutually exclusive; automq is disabled when an eBPF
>>>> steering program is set so I followed the example here.
>>>
>>> I meant from the view of uAPI, the kernel doesn't or can't reject eBPF
>>> while using automq.
>>   > >>
>>>> We don't even have an interface for eBPF to let it fall back to another
>>>> alogirhtm.
>>>
>>> It doesn't even need this, e.g XDP overrides the default receiving path.
>>>
>>>> I could make it fall back to RSS if the eBPF steeering
>>>> program is designed to fall back to automq when it returns e.g., -1. But
>>>> such an interface is currently not defined and defining one is out of
>>>> scope of this patch series.
>>>
>>> Just to make sure we are on the same page, I meant we just need to
>>> make the behaviour consistent: allow eBPF to override the behaviour of
>>> both automq and rss.
>>
>> That assumes eBPF takes precedence over RSS, which is not obvious to me.
> 
> Well, it's kind of obvious. Not speaking the eBPF selector, we have
> other eBPF stuffs like skbedit etc.
> 
>>
>> Let's add an interface for the eBPF steering program to fall back to
>> another steering algorithm. I said it is out of scope before, but it
>> makes clear that the eBPF steering program takes precedence over other
>> algorithms and allows us to delete the code for the configuration
>> validation in this patch.
> 
> Fallback is out of scope but it's not what I meant.
> 
> I meant in the current uAPI take eBPF precedence over automq. It's
> much more simpler to stick this precedence unless we see obvious
> advanatge.

We still have three different design options that preserve the current 
precedence:

1) Precedence order: eBPF -> RSS -> automq
2) Precedence order: RSS -> eBPF -> automq
3) Precedence order: eBPF OR RSS -> automq where eBPF and RSS are 
mutually exclusive

I think this is a unique situation for this steering program and I could 
not find another example in other eBPF stuffs.

The current version implements 3) because it is not obvious whether we 
should choose either 1) or 2). But 1) will be the most capable option if 
eBPF has a fall-back feature.

> 
>>
>>>
>>>>
>>>>>
> 
> [...]
> 
>>>>> Is there a chance that we can reach here without TUN_VNET_HASH_REPORT?
>>>>> If yes, it should be a bug.
>>>>
>>>> It is possible to use RSS without TUN_VNET_HASH_REPORT.
>>>
>>> Another call to separate the ioctls then.
>>
>> RSS and hash reporting are not completely independent though.
> 
> Spec said:
> 
> """
> VIRTIO_NET_F_RSSRequires VIRTIO_NET_F_CTRL_VQ.
> """

I meant the features can be enabled independently, but they will share 
the hash type set when they are enabled at the same time.

> 
>>
>> A plot twist is the "types" parameter; it is a parameter that is
>> "common" for RSS and hash reporting.
> 
> So we can share part of the structure through the uAPI.

Isn't that what this patch does?

> 
>> RSS and hash reporting must share
>> this parameter when both are enabled at the same time; otherwise RSS may
>> compute hash values that are not suited for hash reporting.
> 
> Is this mandated by the spec? If yes, we can add a check. If not,
> userspace risk themselves as a mis-configuration which we don't need
> to bother.

Yes, it is mandated. 5.1.6.4.3 Hash calculation for incoming packets says:
 > A device attempts to calculate a per-packet hash in the following
 > cases:
 >
 >   - The feature VIRTIO_NET_F_RSS was negotiated. The device uses the
 >     hash to determine the receive virtqueue to place incoming packets.
 >   - The feature VIRTIO_NET_F_HASH_REPORT was negotiated. The device
 >     reports the hash value and the hash type with the packet.
 >
 > If the feature VIRTIO_NET_F_RSS was negotiated:
 >
 >   - The device uses hash_types of the virtio_net_rss_config structure
 >     as ’Enabled hash types’ bitmask.
 >   - The device uses a key as defined in hash_key_data and
       hash_key_length of the virtio_net_rss_config structure (see
 >      5.1.6.5.7.1).
 >
 > If the feature VIRTIO_NET_F_RSS was not negotiated:
 >
 >   - The device uses hash_types of the virtio_net_hash_config structure
 >     as ’Enabled hash types’ bitmask.
 >   - The device uses a key as defined in hash_key_data and
 >     hash_key_length of the virtio_net_hash_config structure (see
 >      .1.6.5.6.4).

So when both VIRTIO_NET_F_RSS and VIRTIO_NET_F_HASH_REPORT are 
negotiated, virtio_net_rss_config not only controls RSS but also the 
reported hash values and types. They cannot be divergent.

> 
> Note that spec use different commands for hash_report and rss.

TUNSETVNETHASH is different from these commands in terms that it also 
negotiates VIRTIO_NET_F_HASH_REPORT and VIRTIO_NET_F_RSS.

In the virtio-net specification, it is not defined what would happen if 
these features are negotiated but the VIRTIO_NET_CTRL_MQ_RSS_CONFIG or 
VIRTIO_NET_CTRL_MQ_HASH_CONFIG commands are not sent. There is no such 
ambiguity with TUNSETVNETHASH.

Regards,
Akihiko Odaki

> 
>>
>> The paramter will be duplicated if we have separate ioctls for RSS and
>> hash reporting, and the kernel will have a chiken-egg problem when
>> ensuring they are synchronized; when the ioctl for RSS is issued, should
>> the kernel ensure the "types" parameter is identical with one specified
>> for hash reporting? It will not work if the userspace may decide to
>> configure hash reporting after RSS.
>>
> 
> See my reply above.
> 
> Thanks
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ