netdev - Re: [PATCH] udp: gso: fix MTU check for small packets

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <6798ed91e94a9_987d9294c2@willemb.c.googlers.com.notmuch>
Date: Tue, 28 Jan 2025 09:45:37 -0500
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: Yan Zhai <yan@...udflare.com>, 
 Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc: netdev@...r.kernel.org, 
 "David S. Miller" <davem@...emloft.net>, 
 David Ahern <dsahern@...nel.org>, 
 Eric Dumazet <edumazet@...gle.com>, 
 Jakub Kicinski <kuba@...nel.org>, 
 Paolo Abeni <pabeni@...hat.com>, 
 Simon Horman <horms@...nel.org>, 
 Shuah Khan <shuah@...nel.org>, 
 Josh Hunt <johunt@...mai.com>, 
 Alexander Duyck <alexander.h.duyck@...ux.intel.com>, 
 linux-kernel@...r.kernel.org, 
 linux-kselftest@...r.kernel.org
Subject: Re: [PATCH] udp: gso: fix MTU check for small packets

Yan Zhai wrote:
> Hi Willem,
> 
> Thanks for getting back to me.
> 
> On Mon, Jan 27, 2025 at 8:33 AM Willem de Bruijn
> <willemdebruijn.kernel@...il.com> wrote:
> >
> > Yan Zhai wrote:
> > > Commit 4094871db1d6 ("udp: only do GSO if # of segs > 1") avoided GSO
> > > for small packets. But the kernel currently dismisses GSO requests only
> > > after checking MTU on gso_size. This means any packets, regardless of
> > > their payload sizes, would be dropped when MTU is smaller than requested
> > > gso_size.
> >
> > Is this a realistic concern? How did you encounter this in practice.
> >
> > It *is* a misconfiguration to configure a gso_size larger than MTU.
> >
> > > Meanwhile, EINVAL would be returned in this case, making it
> > > very misleading to debug.
> >
> > Misleading is subjective. I'm not sure what is misleading here. From
> > my above comment, I believe this is correctly EINVAL.
> >
> > That said, if this impacts a real workload we could reconsider
> > relaxing the check. I.e., allowing through packets even when an
> > application has clearly misconfigured UDP_SEGMENT.
> >
> We did encounter a painful reliability issue in production last month.
> 
> To simplify the scenario, we had these symptoms when the issue occurred:
> 1. QUIC connections to host A started to fail, and cannot establish new ones
> 2. User space Wireguard to the exact same host worked 100% fine
> 
> This happened rarely, like one or twice a day, lasting for a few
> minutes usually, but it was quite visible since it is an office
> network.
> 
> Initially this prompted something wrong at the protocol layer. But
> after multiple rounds of digging, we finally figured the root cause
> was:
> 3. Something sometimes pings host B, which shares the same IP with
> host A but different ports (thanks to limited IPv4 space), and its
> PMTU was reduced to 1280 occasionally. This unexpectedly affected all
> traffic to that IP including traffic toward host A. Our QUIC client
> set gso_size to 1350, and that's why it got hit.
> 
> I agree that configurations do matter a lot here. Given how broken the
> PMTU was for the Internet, we might just turn off pmtudisc option on
> our end to avoid this failure path. But for those who hasn't yet, this
> could still be confusing if it ever happens, because nothing seems to
> point to PMTU in the first place:
> * small packets also get dropped
> * error code was EINVAL from sendmsg
> 
> That said, I probably should have used PMTU in my commit message to be
> more clear for our problem. But meanwhile I am also concerned about
> newly added tunnels to trigger the same issue, even if it has a static
> device MTU. My proposal should make the error reason more clear:
> EMSGSIZE itself is a direct signal pointing to MTU/PMTU. Larger
> packets getting dropped would have a similar effect.

Thanks for that context. Makes sense that this is a real issue.

One issue is that with segmentation, the initial mtu checks are
skipped, so they have to be enforced later. In __ip_append_data:

    mtu = cork->gso_size ? IP_MAX_MTU : cork->fragsize;

Also, might this make the debugging actually harder, as the
error condition is now triggered intermittently.