[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aHYiwvElalXstQVa@debian>
Date: Tue, 15 Jul 2025 11:43:30 +0200
From: Guillaume Nault <gnault@...hat.com>
To: Salvatore Bonaccorso <carnil@...ian.org>
Cc: Stefano Brivio <sbrivio@...hat.com>, Aaron Conole <aconole@...hat.com>,
Jakub Kicinski <kuba@...nel.org>,
"David S. Miller" <davem@...emloft.net>,
David Ahern <dsahern@...nel.org>,
Eric Dumazet <edumazet@...gle.com>, Simon Horman <horms@...nel.org>,
netdev@...r.kernel.org, Paolo Abeni <pabeni@...hat.com>,
Charles Bordet <rough.rock3059@...achamp.fr>,
linux-kernel@...r.kernel.org, regressions@...ts.linux.dev,
stable@...r.kernel.org, 1108860@...s.debian.org
Subject: Re: [regression] Wireguard fragmentation fails with VXLAN since
8930424777e4 ("tunnels: Accept PACKET_HOST skb_tunnel_check_pmtu().")
causing network timeouts
On Mon, Jul 14, 2025 at 09:57:52PM +0200, Salvatore Bonaccorso wrote:
> Hi,
>
> Charles Bordet reported the following issue (full context in
> https://bugs.debian.org/1108860)
>
> > Dear Maintainer,
> >
> > What led up to the situation?
> > We run a production environment using Debian 12 VMs, with a network
> > topology involving VXLAN tunnels encapsulated inside Wireguard
> > interfaces. This setup has worked reliably for over a year, with MTU set
> > to 1500 on all interfaces except the Wireguard interface (set to 1420).
> > Wireguard kernel fragmentation allowed this configuration to function
> > without issues, even though the effective path MTU is lower than 1500.
> >
> > What exactly did you do (or not do) that was effective (or ineffective)?
> > We performed a routine system upgrade, updating all packages include the
> > kernel. After the upgrade, we observed severe network issues (timeouts,
> > very slow HTTP/HTTPS, and apt update failures) on all VMs behind the
> > router. SSH and small-packet traffic continued to work.
> >
> > To diagnose, we:
> >
> > * Restored a backup (with the previous kernel): the problem disappeared.
> > * Repeated the upgrade, confirming the issue reappeared.
> > * Systematically tested each kernel version from 6.1.124-1 up to
> > 6.1.140-1. The problem first appears with kernel 6.1.135-1; all earlier
> > versions work as expected.
> > * Kernel version from the backports (6.12.32-1) did not resolve the
> > problem.
> >
> > What was the outcome of this action?
> >
> > * With kernel 6.1.135-1 or later, network timeouts occur for
> > large-packet protocols (HTTP, apt, etc.), while SSH and small-packet
> > protocols work.
> > * With kernel 6.1.133-1 or earlier, everything works as expected.
> >
> > What outcome did you expect instead?
> > We expected the network to function as before, with Wireguard handling
> > fragmentation transparently and no application-level timeouts,
> > regardless of the kernel version.
>
> While triaging the issue we found that the commit 8930424777e4
> ("tunnels: Accept PACKET_HOST in skb_tunnel_check_pmtu()." introduces
> the issue and Charles confirmed that the issue was present as well in
> 6.12.35 and 6.15.4 (other version up could potentially still be
> affected, but we wanted to check it is not a 6.1.y specific
> regression).
>
> Reverthing the commit fixes Charles' issue.
>
> Does that ring a bell?
It doesn't ring a bell. Do you have more details on the setup that has
the problem? Or, ideally, a self-contained reproducer?
> Regards,
> Salvatore
>
Powered by blists - more mailing lists