[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <f7tjz485mpk.fsf@redhat.com>
Date: Wed, 16 Jul 2025 08:44:55 -0400
From: Aaron Conole <aconole@...hat.com>
To: Guillaume Nault <gnault@...hat.com>
Cc: Salvatore Bonaccorso <carnil@...ian.org>, Stefano Brivio
<sbrivio@...hat.com>, Jakub Kicinski <kuba@...nel.org>, "David S.
Miller" <davem@...emloft.net>, David Ahern <dsahern@...nel.org>, Eric
Dumazet <edumazet@...gle.com>, Simon Horman <horms@...nel.org>,
netdev@...r.kernel.org, Paolo Abeni <pabeni@...hat.com>, Charles Bordet
<rough.rock3059@...achamp.fr>, linux-kernel@...r.kernel.org,
regressions@...ts.linux.dev, stable@...r.kernel.org,
1108860@...s.debian.org
Subject: Re: [regression] Wireguard fragmentation fails with VXLAN since
8930424777e4 ("tunnels: Accept PACKET_HOST skb_tunnel_check_pmtu().")
causing network timeouts
Guillaume Nault <gnault@...hat.com> writes:
> On Mon, Jul 14, 2025 at 09:57:52PM +0200, Salvatore Bonaccorso wrote:
>> Hi,
>>
>> Charles Bordet reported the following issue (full context in
>> https://bugs.debian.org/1108860)
>>
>> > Dear Maintainer,
>> >
>> > What led up to the situation?
>> > We run a production environment using Debian 12 VMs, with a network
>> > topology involving VXLAN tunnels encapsulated inside Wireguard
>> > interfaces. This setup has worked reliably for over a year, with MTU set
>> > to 1500 on all interfaces except the Wireguard interface (set to 1420).
>> > Wireguard kernel fragmentation allowed this configuration to function
>> > without issues, even though the effective path MTU is lower than 1500.
>> >
>> > What exactly did you do (or not do) that was effective (or ineffective)?
>> > We performed a routine system upgrade, updating all packages include the
>> > kernel. After the upgrade, we observed severe network issues (timeouts,
>> > very slow HTTP/HTTPS, and apt update failures) on all VMs behind the
>> > router. SSH and small-packet traffic continued to work.
>> >
>> > To diagnose, we:
>> >
>> > * Restored a backup (with the previous kernel): the problem disappeared.
>> > * Repeated the upgrade, confirming the issue reappeared.
>> > * Systematically tested each kernel version from 6.1.124-1 up to
>> > 6.1.140-1. The problem first appears with kernel 6.1.135-1; all earlier
>> > versions work as expected.
>> > * Kernel version from the backports (6.12.32-1) did not resolve the
>> > problem.
>> >
>> > What was the outcome of this action?
>> >
>> > * With kernel 6.1.135-1 or later, network timeouts occur for
>> > large-packet protocols (HTTP, apt, etc.), while SSH and small-packet
>> > protocols work.
>> > * With kernel 6.1.133-1 or earlier, everything works as expected.
>> >
>> > What outcome did you expect instead?
>> > We expected the network to function as before, with Wireguard handling
>> > fragmentation transparently and no application-level timeouts,
>> > regardless of the kernel version.
>>
>> While triaging the issue we found that the commit 8930424777e4
>> ("tunnels: Accept PACKET_HOST in skb_tunnel_check_pmtu()." introduces
>> the issue and Charles confirmed that the issue was present as well in
>> 6.12.35 and 6.15.4 (other version up could potentially still be
>> affected, but we wanted to check it is not a 6.1.y specific
>> regression).
>>
>> Reverthing the commit fixes Charles' issue.
>>
>> Does that ring a bell?
>
> It doesn't ring a bell. Do you have more details on the setup that has
> the problem? Or, ideally, a self-contained reproducer?
+1 - I tested this patch with an OVS setup using vxlan and geneve
tunnels. A reproducer or more details would help.
>> Regards,
>> Salvatore
>>
Powered by blists - more mailing lists