linux-kernel - Re: [PATCH net-next v9 1/6] virtio

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cf4bf799-3a6e-44dc-96ca-fa8d616e6ba7@daynix.com>
Date: Mon, 17 Mar 2025 15:08:24 +0900
From: Akihiko Odaki <akihiko.odaki@...nix.com>
To: Jason Wang <jasowang@...hat.com>
Cc: Jonathan Corbet <corbet@....net>,
 Willem de Bruijn <willemdebruijn.kernel@...il.com>,
 "David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
 Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
 "Michael S. Tsirkin" <mst@...hat.com>, Xuan Zhuo
 <xuanzhuo@...ux.alibaba.com>, Shuah Khan <shuah@...nel.org>,
 linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
 netdev@...r.kernel.org, kvm@...r.kernel.org,
 virtualization@...ts.linux-foundation.org, linux-kselftest@...r.kernel.org,
 Yuri Benditovich <yuri.benditovich@...nix.com>,
 Andrew Melnychenko <andrew@...nix.com>,
 Stephen Hemminger <stephen@...workplumber.org>, gur.stavi@...wei.com,
 Lei Yang <leiyang@...hat.com>, Simon Horman <horms@...nel.org>
Subject: Re: [PATCH net-next v9 1/6] virtio_net: Add functions for hashing

On 2025/03/17 10:24, Jason Wang wrote:
> On Tue, Mar 11, 2025 at 1:49 PM Akihiko Odaki <akihiko.odaki@...nix.com> wrote:
>>
>> On 2025/03/11 9:47, Jason Wang wrote:
>>> On Mon, Mar 10, 2025 at 2:53 PM Akihiko Odaki <akihiko.odaki@...nix.com> wrote:
>>>>
>>>> On 2025/03/10 12:55, Jason Wang wrote:
>>>>> On Fri, Mar 7, 2025 at 7:01 PM Akihiko Odaki <akihiko.odaki@...nix.com> wrote:
>>>>>>
>>>>>> They are useful to implement VIRTIO_NET_F_RSS and
>>>>>> VIRTIO_NET_F_HASH_REPORT.
>>>>>>
>>>>>> Signed-off-by: Akihiko Odaki <akihiko.odaki@...nix.com>
>>>>>> Tested-by: Lei Yang <leiyang@...hat.com>
>>>>>> ---
>>>>>>     include/linux/virtio_net.h | 188 +++++++++++++++++++++++++++++++++++++++++++++
>>>>>>     1 file changed, 188 insertions(+)
>>>>>>
>>>>>> diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
>>>>>> index 02a9f4dc594d02372a6c1850cd600eff9d000d8d..426f33b4b82440d61b2af9fdc4c0b0d4c571b2c5 100644
>>>>>> --- a/include/linux/virtio_net.h
>>>>>> +++ b/include/linux/virtio_net.h
>>>>>> @@ -9,6 +9,194 @@
>>>>>>     #include <uapi/linux/tcp.h>
>>>>>>     #include <uapi/linux/virtio_net.h>
>>>>>>
>>>>>> +struct virtio_net_hash {
>>>>>> +       u32 value;
>>>>>> +       u16 report;
>>>>>> +};
>>>>>> +
>>>>>> +struct virtio_net_toeplitz_state {
>>>>>> +       u32 hash;
>>>>>> +       const u32 *key;
>>>>>> +};
>>>>>> +
>>>>>> +#define VIRTIO_NET_SUPPORTED_HASH_TYPES (VIRTIO_NET_RSS_HASH_TYPE_IPv4 | \
>>>>>> +                                        VIRTIO_NET_RSS_HASH_TYPE_TCPv4 | \
>>>>>> +                                        VIRTIO_NET_RSS_HASH_TYPE_UDPv4 | \
>>>>>> +                                        VIRTIO_NET_RSS_HASH_TYPE_IPv6 | \
>>>>>> +                                        VIRTIO_NET_RSS_HASH_TYPE_TCPv6 | \
>>>>>> +                                        VIRTIO_NET_RSS_HASH_TYPE_UDPv6)
>>>>>
>>>>> Let's explain why
>>>>>
>>>>> #define VIRTIO_NET_HASH_REPORT_IPv6_EX         7
>>>>> #define VIRTIO_NET_HASH_REPORT_TCPv6_EX        8
>>>>> #define VIRTIO_NET_HASH_REPORT_UDPv6_EX        9
>>>>>
>>>>> are missed here.
>>>>
>>>> Because they require parsing IPv6 options and I'm not sure how many we
>>>> need to parse. QEMU's eBPF program has a hard-coded limit of 30 options;
>>>> it has some explanation for this limit, but it does not seem definitive
>>>> either:
>>>> https://gitlab.com/qemu-project/qemu/-/commit/f3fa412de28ae3cb31d38811d30a77e4e20456cc#6ec48fc8af2f802e92f5127425e845c4c213ff60_0_165
>>>>
>>>
>>> How about the usersapce datapath RSS in Qemu? (We probably don't need
>>> to align with eBPF RSS as it's just a reference implementation)
>>
>> The userspace datapath RSS has no limit.
>>
>> The reference implementation is the userspace datapath. The eBPF program
>>    is intended to bring real performance benefit to Windows guests in
>> contrary.
>>
>> The userspace implementation does its best to provide defined RSS
>> capabilities but may not be performant. Parsing all IPv6 options have a
>> performance implication, but it is fine because it is not intended to be
>> performant in the first place.
>>
>> The performance problem is inherent to the userspace implementation,
>> which adds an extra overhead to the datapath. The eBPF program on the
>> other hand does not incur such overhead because it replaces the existing
>> steering algorithm (automq) instead of adding another layer. Hence the
>> eBPF program can be practical.
>>
>> That said, it is not that important to align with the userspace and eBPF
>> RSS in QEMU because they are still experimental anyway; the eBPF RSS has
>> potential to become a practical implementation but it is still in
>> development. The libvirt integration for the eBPF RSS is still not
>> complete, and we occasionally add fixes for RSS and hash reporting
>> without backporting to the stable branch.
>>
>> I'm adding interfaces to negotiate hash types rather for the future
>> extensibility. The specification may gain more hash types in the future
>> and other vhost backends may have a different set of hash types
>> supported. Figuring out how to deal with different sets of supported
>> hash typs is essential for both the kernel and QEMU.
>>
>>>
>>>> In this patch series, I add an ioctl to query capability instead; it
>>>> allows me leaving those hash types unimplemented and is crucial to
>>>> assure extensibility for future additions of hash types anyway. Anyone
>>>> who find these hash types useful can implement in the future.
>>>
>>> Yes, but we need to make sure no userspace visible behaviour changes
>>> after migration.
>>
>> Indeed, the goal is to make extensibility and migration compatible.
> 
> So I see this part:
> 
> + uint32_t supported_hash_types = n->rss_data.supported_hash_types;
> + uint32_t peer_hash_types = n->rss_data.peer_hash_types;
> + bool use_own_hash =
> + (supported_hash_types & VIRTIO_NET_RSS_SUPPORTED_HASHES) ==
> + supported_hash_types;
> + bool use_peer_hash =
> + n->rss_data.peer_hash_available &&
> + (supported_hash_types & peer_hash_types) == supported_hash_types;
> 
> It looks like it would be a challenge to support vhost-user in the
> future if vhost-user supports hash feature others than source?

The vhost-user backend will need to retrieve the supported hash types 
with VHOST_USER_GET_CONFIG as the vhost-vdpa backend does with 
VHOST_VDPA_GET_CONFIG.

> 
>>
>>>
>>>>
>>>>>
>>>>> And explain how we could maintain migration compatibility
>>>>>
>>>>> 1) Does those three work for userspace datapath in Qemu? If yes,
>>>>> migration will be broken.
>>>>
>>>> They work for userspace datapath so my RFC patch series for QEMU uses
>>>> TUNGETVNETHASHCAP to prevent breaking migration:
>>>> https://patchew.org/QEMU/20240915-hash-v3-0-79cb08d28647@daynix.com/
>>>>
>>>
>>> Ok, let's mention this in the cover letter. Another interesting thing
>>> is the migration from 10.0 to 9.0.
>>
>> The patch series is already mentioned in the cover letter. A description
>> of the intended use case of TUNGETVNETHASHCAP will be a good addition.
>> I'll add it to this patch so that it will be kept in tree after it gets
>> merged.
>>
>> Migration between two different QEMU versions should be handled with
>> versioned machine types.
>>
>> When a machine created in 9.0 is being migrated to 10.0, the machine
>> must set the hash type properties to match with the hash types supported
>> by the existing implementations, which means it sets the property for
>> VIRTIO_NET_HASH_REPORT_IPv6_EX to true, for example. Because this hash
>> type is currently not included in TUNGETVNETHASHCAP, the machine will
>> keep using the implementation used previously. The machine can be also
>> migrated back to 9.0 again.
>>
>> A machine type with version 10.0 cannot be migrated to 9.0 by design so
>> there is no new problem.
> 
> I meant migrate qemu 11.0 with machine type 10.0 to qemu 10.0 with
> machine 10.0 etc.

Let's assume QEMU 10.0 will support this new ioctl while QEMU 9.0 doesn't.

The description in my previous email was wrong. Checking the patch 
series again, I found I bumped the version number of 
vmstate_virtio_net_rss. So migrating QEMU 10.0 with machine type 9.0 to 
QEMU 9.0 will result in an error.

We can remove this error by introducing a compatibility property, but I 
don't think it's worth. As I noted in the previous email, the RSS 
feature is still in development and I don't think we need to support 
migrating to older QEMU versions. It gives an error instead of silently 
breaking a migrated VM at least.

Regards,
Akihiko Odaki

> 
>>
>>>
>>>> This patch series first adds configuration options for users to choose
>>>> hash types. QEMU then automatically picks one implementation from the
>>>> following (the earlier one is the more preferred):
>>>> 1) The hash capability of vhost hardware
>>>> 2) The hash capability I'm proposing here
>>>> 3) The eBPF program
>>>> 4) The pure userspace implementation
>>>>
>>>> This decision depends on the following:
>>>> - The required hash types; supported ones are queried for 1) and 2)
>>>> - Whether vhost is enabled or not and what vhost backend is used
>>>> - Whether hash reporting is enabled; 3) is incompatible with this
>>>>
>>>> The network device will not be realized if no implementation satisfies
>>>> the requirements.
>>>
>>> This makes sense, let's add this in the cover letter.
>>
>> I'll add it to the QEMU patch as it's more about details of QEMU.
>> The message of this patch will explain how TUNGETVNETHASHCAP and
>> TUNSETVNETHASH makes extensibility and migrattion compatible in general.
>>
>> Regards,
>> Akihiko Odaki
>>
>>>
>>>>
>>>>> 2) once we support those three in the future. For example, is the qemu
>>>>> expected to probe this via TUNGETVNETHASHCAP in the destination and
>>>>> fail the migration?
>>>>
>>>> QEMU is expected to use TUNGETVNETHASHCAP, but it can selectively enable
>>>> hash types with TUNSETVNETHASH to keep migration working.
>>>>
>>>> In summary, this patch series provides a sufficient facility for the
>>>> userspace to make extensibility and migration compatible;
>>>> TUNGETVNETHASHCAP exposes all of the kernel capabilities and
>>>> TUNSETVNETHASH allows the userspace to limit them.
>>>>
>>>> Regards,
>>>> Akihiko Odaki
>>>
>>> Fine.
>>>
>>> Thanks
>>>
> 
> Thanks
>