[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CACGkMEt=qPdOiK+7o69cwNhBukHZBteaykMqbiFwHg+r8+q1bg@mail.gmail.com>
Date: Fri, 12 Apr 2024 10:59:00 +0800
From: Jason Wang <jasowang@...hat.com>
To: Yuri Benditovich <yuri.benditovich@...nix.com>
Cc: Willem de Bruijn <willemdebruijn.kernel@...il.com>,
"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, Shuah Khan <shuah@...nel.org>, yan@...nix.com,
andrew@...nix.com, netdev <netdev@...r.kernel.org>
Subject: Re: [PATCH net] net: change maximum number of UDP segments to 128
On Thu, Apr 11, 2024 at 5:39 PM Yuri Benditovich
<yuri.benditovich@...nix.com> wrote:
>
> On Thu, Apr 11, 2024 at 8:53 AM Jason Wang <jasowang@...hat.com> wrote:
> >
> > On Thu, Apr 11, 2024 at 1:32 PM Yuri Benditovich
> > <yuri.benditovich@...nix.com> wrote:
> > >
> > > On Thu, Apr 11, 2024 at 7:04 AM Jason Wang <jasowang@...hat.com> wrote:
> > > >
> > > > On Wed, Apr 10, 2024 at 4:28 PM Yuri Benditovich
> > > > <yuri.benditovich@...nix.com> wrote:
> > > > >
> > > > > On Wed, Apr 10, 2024 at 9:07 AM Jason Wang <jasowang@...hat.com> wrote:
> > > > > >
> > > > > > On Tue, Apr 9, 2024 at 1:48 PM Yuri Benditovich
> > > > > > <yuri.benditovich@...nix.com> wrote:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Apr 9, 2024 at 6:53 AM Jason Wang <jasowang@...hat.com> wrote:
> > > > > > >>
> > > > > > >> On Mon, Apr 8, 2024 at 3:24 PM Yuri Benditovich
> > > > > > >> <yuri.benditovich@...nix.com> wrote:
> > > > > > >> >
> > > > > > >> > On Mon, Apr 8, 2024 at 9:27 AM Jason Wang <jasowang@...hat.com> wrote:
> > > > > > >> > >
> > > > > > >> > > On Sun, Apr 7, 2024 at 2:50 AM Yuri Benditovich
> > > > > > >> > > <yuri.benditovich@...nix.com> wrote:
> > > > > > >> > > >
> > > > > > >> > > > Fixes: fc8b2a619469378 ("net: more strict VIRTIO_NET_HDR_GSO_UDP_L4 validation")
> > > > > > >> > > >
> > > > > > >> > > > The mentioned above commit adds check of potential number
> > > > > > >> > > > of UDP segments vs UDP_MAX_SEGMENTS in linux/virtio_net.h.
> > > > > > >> > > > After this change certification test of USO guest-to-guest
> > > > > > >> > > > transmit on Windows driver for virtio-net device fails,
> > > > > > >> > > > for example with packet size of ~64K and mss of 536 bytes.
> > > > > > >> > > > In general the USO should not be more restrictive than TSO.
> > > > > > >> > > > Indeed, in case of unreasonably small mss a lot of segments
> > > > > > >> > > > can cause queue overflow and packet loss on the destination.
> > > > > > >> > > > Limit of 128 segments is good for any practical purpose,
> > > > > > >> > > > with minimal meaningful mss of 536 the maximal UDP packet will
> > > > > > >> > > > be divided to ~120 segments.
> > > > > > >> > >
> > > > > > >> > > Assuming different OS guests could run on top of KVM. I wonder if a
> > > > > > >> > > better fix is to relax the following check:
> > > > > > >> > >
> > > > > > >> > > => if (skb->len - p_off > gso_size * UDP_MAX_SEGMENTS)
> > > > > > >> > > return -EINVAL;
> > > > > > >> > There are also checks vs UDP_MAX_SEGMENTS in udp.c (in ipv4 and ipv6),
> > > > > > >> > they do not prevent guest-to-guest from passing but cause packet dropping in
> > > > > > >> > other cases..
> > > > > > >> > >
> > > > > > >> > > Changing UDP_MAX_SEGMENTS may have side effects. For example, a new
> > > > > > >> > > Linux guest run on top of a old Linux and other.
> > > > > > >> > IMO, in the worst case _in specific setups_ the communication will behave like
> > > > > > >> > it does now in all the setups.
> > > > > > >>
> > > > > > >> I meant if the guest limit is 128 but host limit is 64.
> > > > > > >
> > > > > > >
> > > > > > > If the guest limit is (128 or 64) and host limit is 64:
> > > > > > > If we send a UDP packet with USO (length 64K, mss 600) - it is dropped.
> > > > > > > setsockopt does not limit us to use <=64 packets, neither in windows nor in Linux,
> > > > > >
> > > > > > Just to make sure we are on the same page:
> > > > > >
> > > > > > Before fc8b2a619469378,
> > > > > >
> > > > > > 1) Windows guest works on Linux Host since we don't check against
> > > > > > UDP_MAX_SEGMENTS on the host
> > > > >
> > > > >
> > > > > Windows guest-to-guest works since there is no check for
> > > > > UDP_MAX_SEGMENTS in virtio_net.h
> > > > > Windows guest to Linux host suffers with size=64K mss=536 (as an example)
> > > > >
> > > > > > 2) Linux guest works on Linux Host, since it will always send packet
> > > > > > with less than UDP_MAX_SEGMENTS
> > > > >
> > > > >
> > > > > Not exactly.
> > > > > If you use "selftest" (udpgso*) _as is_, it will work as it has
> > > > > _internal_ define of 64 and never tries to do more than that.
> > > > > But (sorry for repeating that) there is no setsockopt() limitation and
> > > > > you're free to modify the test code
> > > > > to use more segments (as an example) and this will not work.
> > > >
> > > > I meant this part in udp_send_skb()
> > > >
> > > > => if (datalen > cork->gso_size * UDP_MAX_SEGMENTS) {
> > > > kfree_skb(skb);
> > > > return -EINVAL;
> > > > }
> > > >
> > > > >
> > > > > >
> > > > > >
> > > > > > This is the behaviour that we need to stick to, otherwise we break the
> > > > > > guests as you've noticed.
> > > > >
> > > > > We can stick to the behavior I've described above if we change only define
> > > > > UDP_MAX_SEGMENTS 64->128 in udp.h but leave untouched
> > > > > UDP_MAX_SEGMENTS=64 in udpgso (test code)
> > > > > (IMO does not make too much sense but possible)
> > > > >
> > > > > Actually, all this discussion happens because the initial code of USO
> > > > > in the kernel was delivered with define of 64 without any special
> > > > > reason, just because this looked enough then.
> > > >
> > > > Right, but it's too late to "fix", the only thing we can do now is to
> > > > make sure to not break the application that worked in the past.
> > > >
> > > > >
> > > > > >
> > > > > > If we fix it by increasing the UDP_MAX_SEGMENTS, it fixes for 1) but
> > > > > > not necessarily for 2). If we don't check against UDP_MAX_SEGMENTS, it
> > > > > > works for 2) as well.
> > > > >
> > > >
> > > > So what I want to say is that having a check for UDP_MAX_SEGMENTS
> > > > seems to be problematic. And increasing it to 128 complicates the
> > > > problem furtherly.
> > > >
> > > > > There are following cases:
> > > > > - guest-to-guest (win-to-win, lin-to-lin,win-to-lin, lin-to-win)
> > > >
> > > > If we don't have the check for UDP_MAX_SEGMENTS, everything should be fine.
> > > >
> > > > If the sender can produces 128 and 64 is checked in the middle of the
> > > > datapath in either tun or virtio-net, the packet will be dropped.
> > > >
> > > > > -guest-to-host (guest=lin, guest=win)
> > > >
> > > > Same as above.
> > > >
> > > > > -host-to-host
> > > >
> > > > I don't see any issue in this part.
> > >
> > > The issue is that host-to-host UDP packets with more than 64 segments
> > > are dropped due to check vs UDP_MAX_SEGMENTS is udp.c (v4 and v6)
> >
> > Yes, but if they fail since the introduction of UDP gso if I was not wrong.
>
> In order to prevent failure we can limit the test to work with 64 segments
> and it will be happy. Nothing will be broken.
I'm kind of lost here, if you still limit it to 64, why do you want to
increase the kernel to 128?
>
> >
> > >
> > >
> > > >
> > > > > -host-to-guest (guest=lin, guest=win)
> > > >
> > > > Same as guest-to-guest.
> > > >
> > > > >
> > > > > Resulting table of results is much more complicated than you described.
> > > > > We can't use the udpgso code as the only indicator of go/nogo.
> > > >
> > > > We should not break userspace. If there's a setup that udpgso can work
> > > > in the past but not after a patch is applied, we should avoid such
> > > > cases.
> > > >
> > > > That's why I think avoiding checking UDP_MAX_SEGMENTS and doing other
> > > > hardening might be better (for example as Eric suggested).
> > >
> > > I've checked all this thread and I did not find any Eric's suggestion regarding
> > > other hardening. Which suggestion have you mentioned?
> >
> > Quoted from Eric email (btw, for some reason the discussion is not
> > reached to the list, adding list here)
> >
> > """
> >
> > > Assuming different OS guests could run on top of KVM. I wonder if a
> > > better fix is to relax the following check:
> > >
> > > => if (skb->len - p_off > gso_size * UDP_MAX_SEGMENTS)
> > > return -EINVAL;
> > >
> > > Changing UDP_MAX_SEGMENTS may have side effects. For example, a new
> > > Linux guest run on top of a old Linux and other.
> >
> > Typical qdisc packet limit is 1000.
> >
> > I think a limit prevents abuses, like gso_size == 1, and skb->len = 65000
>
> I understand. This was your suggestion to remove the check here.
> Then with > 64 segments:
> - windows guest-to-guest will work
> - host to windows guest will not work
> - linux guest to windows guest will not work
> - other pairs - I do not know yet
Why can this break? Did any guest produce packets with either gso_size
= 1 or skb->len >= 65000?
>
> This is what you prefer??
Thanks
>
>
> >
> > """
> >
> > The actual value needs more thought, for example 65K may block the
> > implementation of jumbogram in the future.
> >
> > Thanks
> >
> > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > >>
> > > > > > >>
> > > > > > >> > Note that this limit (UDP_MAX_SEGMENTS) is set
> > > > > > >> > in internal kernel define, it is not based on any RFC, not visible via
> > > > > > >> > ethtool etc.
> > > > > > >>
> > > > > > >> In the future, it needs to be advertised to the guest via virtio spec.
> > > > > > >> Otherwise it might have compatibility issues.
> > > > > > >
> > > > > > >
> > > > > > > The spec targets 2 areas - hardware solutions and software solutions.
> > > > > > > For hardware solutions everything is clear - there are no declared limits and if the hardware will limit the number of segments to anything - it is a bug.
> > > > > > > So the advertising makes sense only for hypervisors and for example, qemu will need to obtain this magic number from the kernel.
> > > > > >
> > > > > > Right.
> > > > > >
> > > > > > > If/when we'll start adding this to the spec we'll face a reasonable question - why do we need to add something that is not mentioned in any RFC, why not add such a thing for TCP also?
> > > > > >
> > > > > > So the kernel has gso_max_segs for netdevice. We probably need it for
> > > > > > virtio-net as well, and it might be useful for jumbograms as well.
> > > > > >
> > > > > virtio-net is "limited" by common "#define GSO_MAX_SEGS 65535u"
> > > >
> > > > This works since the host will do software segmentation as a fallback
> > > > which will be very slow.
> > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > Next, let's say we advertise this number over virtio capabilities - does this solve something?
> > > > > >
> > > > > > For example, doing software segmentation by kernel?
> > > > > >
> > > > > > > The driver can't communicate this capability up. The setsockopt will not limit this parameter.
> > > > > > >
> > > > > > >>
> > > > > > >> For example, migrate from 64 to 128.
> > > > > > >>
> > > > > > > Yes, but the migration from higher kernel to lower one looks like an immortal problem.
> > > > > >
> > > > > > This is not rare. For example, the patch for 128 is applied on the
> > > > > > source but not destination.
> > > > >
> > > > > I agree that such migration can happen and _theoretically_ may cause
> > > > > some misunderstanding
> > > > > I'm almost sure that practically this can't cause any.misunderstanding
> > > > > and we can discuss this specific problem in depth.
> > > > > (I just need some time for some tests)
> > > >
> > > > Thanks
> > > >
> > > > >
> > > > >
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > >
> > > > > > >>
> > > > > > >> > The same on Windows - the adapter does not have such a capability as maximal
> > > > > > >> > number of UDP segments, so the upper layers are unaware of any limitation.
> > > > > > >>
> > > > > > >> Right, that might be problematic.
> > > > > > >>
> > > > > > >> The checking of UDP_MAX_SEGMENTS implies an agreement of guest and
> > > > > > >> host. But such implications are not true .
> > > > > > >>
> > > > > > >> Thanks
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >> > >
> > > > > > >> > > Thanks
> > > > > > >> > >
> > > > > > >> > > >
> > > > > > >> > > > Signed-off-by: Yuri Benditovich <yuri.benditovich@...nix.com>
> > > > > > >> > > > ---
> > > > > > >> > > > include/linux/udp.h | 2 +-
> > > > > > >> > > > tools/testing/selftests/net/udpgso.c | 2 +-
> > > > > > >> > > > 2 files changed, 2 insertions(+), 2 deletions(-)
> > > > > > >> > > >
> > > > > > >> > > > diff --git a/include/linux/udp.h b/include/linux/udp.h
> > > > > > >> > > > index 3748e82b627b..7e75ccdf25fe 100644
> > > > > > >> > > > --- a/include/linux/udp.h
> > > > > > >> > > > +++ b/include/linux/udp.h
> > > > > > >> > > > @@ -108,7 +108,7 @@ struct udp_sock {
> > > > > > >> > > > #define udp_assign_bit(nr, sk, val) \
> > > > > > >> > > > assign_bit(UDP_FLAGS_##nr, &udp_sk(sk)->udp_flags, val)
> > > > > > >> > > >
> > > > > > >> > > > -#define UDP_MAX_SEGMENTS (1 << 6UL)
> > > > > > >> > > > +#define UDP_MAX_SEGMENTS (1 << 7UL)
> > > > > > >> > > >
> > > > > > >> > > > #define udp_sk(ptr) container_of_const(ptr, struct udp_sock, inet.sk)
> > > > > > >> > > >
> > > > > > >> > > > diff --git a/tools/testing/selftests/net/udpgso.c b/tools/testing/selftests/net/udpgso.c
> > > > > > >> > > > index 1d975bf52af3..85b3baa3f7f3 100644
> > > > > > >> > > > --- a/tools/testing/selftests/net/udpgso.c
> > > > > > >> > > > +++ b/tools/testing/selftests/net/udpgso.c
> > > > > > >> > > > @@ -34,7 +34,7 @@
> > > > > > >> > > > #endif
> > > > > > >> > > >
> > > > > > >> > > > #ifndef UDP_MAX_SEGMENTS
> > > > > > >> > > > -#define UDP_MAX_SEGMENTS (1 << 6UL)
> > > > > > >> > > > +#define UDP_MAX_SEGMENTS (1 << 7UL)
> > > > > > >> > > > #endif
> > > > > > >> > > >
> > > > > > >> > > > #define CONST_MTU_TEST 1500
> > > > > > >> > > > --
> > > > > > >> > > > 2.34.3
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
>
Powered by blists - more mailing lists