[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180608210403.2moomjshtwszvsso@unicorn.suse.cz>
Date: Fri, 8 Jun 2018 23:04:03 +0200
From: Michal Kubecek <mkubecek@...e.cz>
To: Stephen Hemminger <stephen@...workplumber.org>
Cc: Eric Dumazet <eric.dumazet@...il.com>, netdev@...r.kernel.org
Subject: Re: Fw: [Bug 199995] New: Ramdomly sent TCP Reset from Kernel with
bonding mode "brodcast"
On Fri, Jun 08, 2018 at 09:59:54AM -0700, Stephen Hemminger wrote:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=199995
>
> Bug ID: 199995
> Summary: Ramdomly sent TCP Reset from Kernel with bonding mode
> "brodcast"
>
> after a dist upgrade from Ubuntu 17.10 (Kernel 4.13.x) to Ubuntu 18.04 (Kernel
> 4.15.0) I suffer from ramdomly generated TCP RST packets sent (presumably) by
> the Kernel
> on a bonding device that uses bonding mode "brodcast" with 2 physical NICs.
>
> With tcpdump/whireshark I can see that the kernel randomly sends TCP-RST
> packets after the SYN/ACK/ACK packet is received (see attached PCAP).
> This only happens if the kernel receives the initial SYN packet on both
> physical NICs (and therefore seeing it twice), before the connection is
> established by sending SYN/ACK.
> It's not happening in 100% of all cases and only, if the system can use two or
> more CPU cores/threads. With only one CPU available to the system, this
> behaviour is not reproducable.
I have seen similar report earlier from one of our customers running
SLE12 SP2 (kernel 4.4). The problem is that if duplicated SYN packet is
received on both slaves, these two copies can be processed by the
lockless listener simultaneously on different CPUs and each can reply by
SYNACK with different sequence number which results in a reset.
I tried to think of a way to prevent this race without losing the
performance gain of lockless listener but couldn't come with anything.
Eventually, I managed to persuade the customer that this setup (where
each packet is received twice under normal circumstances) is not what
broadcast mode was designed for (based on the description in
Documentation/networking/bonding.txt).
However, the lockless listener was introduced in 4.4 so it's not clear
why reporter started encountering this after an upgrade from 4.13 to
4.15.
Michal Kubecek
Powered by blists - more mailing lists