lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Fri, 16 Oct 2020 09:35:42 +0200 From: Eric Dumazet <edumazet@...gle.com> To: Neal Cardwell <ncardwell@...gle.com> Cc: Apollon Oikonomopoulos <apoikos@...sg.gr>, Yuchung Cheng <ycheng@...gle.com>, Netdev <netdev@...r.kernel.org>, Soheil Hassas Yeganeh <soheil@...gle.com> Subject: Re: TCP sender stuck in persist despite peer advertising non-zero window On Fri, Oct 16, 2020 at 12:37 AM Neal Cardwell <ncardwell@...gle.com> wrote: > > On Thu, Oct 15, 2020 at 6:12 PM Apollon Oikonomopoulos <apoikos@...sg.gr> wrote: > > > > Yuchung Cheng <ycheng@...gle.com> writes: > > > > > On Thu, Oct 15, 2020 at 1:22 PM Neal Cardwell <ncardwell@...gle.com> wrote: > > >> > > >> On Thu, Oct 15, 2020 at 2:31 PM Apollon Oikonomopoulos <apoikos@...sg.gr> wrote: > > >> > > > >> > Hi, > > >> > > > >> > I'm trying to debug a (possible) TCP issue we have been encountering > > >> > sporadically during the past couple of years. Currently we're running > > >> > 4.9.144, but we've been observing this since at least 3.16. > > >> > > > >> > Tl;DR: I believe we are seeing a case where snd_wl1 fails to be properly > > >> > updated, leading to inability to recover from a TCP persist state and > > >> > would appreciate some help debugging this. > > >> > > >> Thanks for the detailed report and diagnosis. I think we may need a > > >> fix something like the following patch below. > > > > That was fast, thank you! > > > > >> > > >> Eric/Yuchung/Soheil, what do you think? > > > wow hard to believe how old this bug can be. The patch looks good but > > > can Apollon verify this patch fix the issue? > > > > Sure, I can give it a try and let the systems do their thing for a couple of > > days, which should be enough to see if it's fixed. > > Great, thanks! > > > Neal, would it be possible to re-send the patch as an attachment? The > > inlined version does not apply cleanly due to linewrapping and > > whitespace changes and, although I can re-type it, I would prefer to test > > the exact same thing that would be merged. > > Sure, I have attached the "git format-patch" format of the commit. It > does seem to apply cleanly to the v4.9.144 kernel you mentioned you > are using. > > Thanks for testing this! > > best, > neal Ouch, this is an interesting bug. Would netperf -t TCP_RR -- -r 2GB,2GB " be a possible test ? (I am afraid packetdrill won't be able to test this in a reasonable amount of time) Neal, can you include in your changelog the link to Apollon awesome email, I think it was a very nice investigation and Apollon deserves more credit than a mere "Reported-by:" tag ;) Maybe this one : Link: https://www.spinics.net/lists/netdev/msg692430.html Thanks !
Powered by blists - more mailing lists