netdev - Re: [PATCH RFC v3 7/8] tun: enable gso over UDP tunnel support.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1f0933e0-ab58-41b8-832b-5336618be8b3@redhat.com>
Date: Mon, 16 Jun 2025 12:17:42 +0200
From: Paolo Abeni <pabeni@...hat.com>
To: Jason Wang <jasowang@...hat.com>, "Michael S. Tsirkin" <mst@...hat.com>
Cc: netdev@...r.kernel.org, Willem de Bruijn
 <willemdebruijn.kernel@...il.com>, Andrew Lunn <andrew+netdev@...n.ch>,
 "David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
 Jakub Kicinski <kuba@...nel.org>, Xuan Zhuo <xuanzhuo@...ux.alibaba.com>,
 Eugenio Pérez <eperezma@...hat.com>,
 Yuri Benditovich <yuri.benditovich@...nix.com>,
 Akihiko Odaki <akihiko.odaki@...nix.com>
Subject: Re: [PATCH RFC v3 7/8] tun: enable gso over UDP tunnel support.

On 6/16/25 6:53 AM, Jason Wang wrote:
> On Thu, Jun 12, 2025 at 7:03 PM Paolo Abeni <pabeni@...hat.com> wrote:
>> On 6/12/25 6:55 AM, Jason Wang wrote:
>>> On Fri, Jun 6, 2025 at 7:46 PM Paolo Abeni <pabeni@...hat.com> wrote:
>>>> @@ -1720,8 +1732,16 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
>>>>
>>>>         if (tun->flags & IFF_VNET_HDR) {
>>>>                 int vnet_hdr_sz = READ_ONCE(tun->vnet_hdr_sz);
>>>> +               int parsed_size;
>>>>
>>>> -               hdr_len = tun_vnet_hdr_get(vnet_hdr_sz, tun->flags, from, &gso);
>>>> +               if (vnet_hdr_sz < TUN_VNET_TNL_SIZE) {
>>>
>>> I still don't understand why we need to duplicate netdev features in
>>> flags, and it seems to introduce unnecessary complexities. Can we
>>> simply check dev->features instead?
>>>
>>> I think I've asked before, for example, we don't duplicate gso and
>>> csum for non tunnel packets.
>>
>> My fear was that if
>> - the guest negotiated VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO
>> - tun stores the negotiated offload info netdev->features
>> - the tun netdev UDP tunnel feature is disabled via ethtool
>>
>> tun may end-up sending to the guest packets without filling the tnl hdr,
>> which should be safe, as the driver should not use such info as no GSO
>> over UDP packets will go through, but is technically against the
>> specification.
> 
> Probably not? For example this is the way tun works with non tunnel GSO as well.
> 
> (And it allows the flexibility of debugging etc).
> 
>>
>> The current implementation always zero the whole virtio net hdr space,
>> so there is no such an issue.
>>
>> Still the additional complexity is ~5 lines and makes all the needed
>> information available on a single int, which is quite nice performance
>> wise. Do you have strong feeling against it?
> 
> See above and at least we can disallow the changing of UDP tunnel GSO
> (but I don't see too much value).

I'm sorry, but I don't understand what is the suggestion/request here.
Could you please phrase it?

Also please allow me to re-state my main point. The current
implementation adds a very limited amount of code in the control path,
and makes the data path simpler and faster - requiring no new argument
to the tun_hdr_* helper instead of (at least) one as the other alternative.

It looks like tun_hdr_* argument list could grow with every new feature,
but I think we should try to avoid that.

>>>> @@ -2426,7 +2460,16 @@ static int tun_xdp_one(struct tun_struct *tun,
>>>>         if (metasize > 0)
>>>>                 skb_metadata_set(skb, metasize);
>>>>
>>>> -       if (tun_vnet_hdr_to_skb(tun->flags, skb, gso)) {
>>>> +       /* Assume tun offloads are enabled if the provided hdr is large
>>>> +        * enough.
>>>> +        */
>>>> +       if (READ_ONCE(tun->vnet_hdr_sz) >= TUN_VNET_TNL_SIZE &&
>>>> +           xdp->data - xdp->data_hard_start >= TUN_VNET_TNL_SIZE)
>>>> +               flags = tun->flags | TUN_VNET_TNL_MASK;
>>>> +       else
>>>> +               flags = tun->flags & ~TUN_VNET_TNL_MASK;
>>>
>>> I'm not sure I get the point that we need dynamics of
>>> TUN_VNET_TNL_MASK here. We know if tunnel gso and its csum or enabled
>>> or not,
>>
>> How does tun know about VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO or
>> VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO_CSUM?
> 
> I think it can be done in a way that works for non-tunnel gso.
> 
> The most complicated case is probably the case HOST_UDP_TUNNEL_X is
> enabled but GUEST_UDP_TUNNEL_X is not. In this case tun can know this
> by:
> 
> 1) vnet_hdr_len is large enough
> 2) UDP tunnel GSO is not enabled in netdev->features
> 
> If HOST_UDP_TUNNEL_X is not enabled by GUEST_UDP_TUNNEL_X is enabled,
> it can behave like existing non-tunnel GSO: still accept the UDP GSO
> tunnel packet.

AFAICS the text above matches/describes quite accurately the
implementation proposed in this patch and quoted above. Which in turn
confuses me, because I don't see what you would prefer to see
implemented differently.

>> The user-space does not tell the tun device about any of the host
>> offload features. Plain/baremetal GSO information are always available
>> in the basic virtio net header, so there is no size check, but the
>> overall behavior is similar - tun assumes the features have been
>> negotiated if the relevant bits are present in the header.
> 
> I'm not sure I understand here, there's no bit in the virtio net
> header that tells us if the packet contains the tunnel gso field. And
> the check of:
> 
> READ_ONCE(tun->vnet_hdr_sz) >= TUN_VNET_TNL_SIZE
> 
> seems to be not buggy. As qemu already did:
> 
> static void virtio_net_set_mrg_rx_bufs(VirtIONet *n, int mergeable_rx_bufs,
>                                        int version_1, int hash_report)
> {
>     int i;
>     NetClientState *nc;
> 
>     n->mergeable_rx_bufs = mergeable_rx_bufs;
> 
>     if (version_1) {
>         n->guest_hdr_len = hash_report ?
>             sizeof(struct virtio_net_hdr_v1_hash) :
>             sizeof(struct virtio_net_hdr_mrg_rxbuf);
>         n->rss_data.populate_hash = !!hash_report;
> 
> ...

Note that the qemu code quoted above does not include tunnel handling.

TUN_VNET_TNL_SIZE (== sizeof(struct virtio_net_hdr_v1_tunnel)) will be
too small when VIRTIO_NET_F_HASH_REPORT is enabled, too.

I did not handle that case here, due to the even greater overlapping with:

https://lore.kernel.org/netdev/20250530-rss-v12-0-95d8b348de91@daynix.com/

What I intended to do is:
- set another bit in `flags` according to the negotiated
VIRTIO_NET_F_HASH_REPORT value
- use such info in tun_vnet_parse_size() to computed the expected vnet
hdr len correctly.
- replace TUN_VNET_TNL_SIZE usage in tun.c with tun_vnet_parse_size() calls

I'm unsure if the above answer your question/doubt.

Anyhow I now see that keeping the UDP GSO related fields offset constant
regardless of VIRTIO_NET_F_HASH_REPORT would remove some ambiguity from
the relevant control path.

I think/hope we are still on time to update the specification clarifying
that, but I'm hesitant to take that path due to the additional
(hopefully small) overhead for the data path - and process overhead TBH.

On the flip (positive) side such action will decouple more this series
from the HASH_REPORT support.

Please advice, thanks!

/P