netdev - RE: Debugging stuck tcp connection across localhost

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <e8e6693695c04bd6a679ddd43733703b@AcuMS.aculab.com>
Date:   Mon, 10 Jan 2022 22:16:50 +0000
From:   David Laight <David.Laight@...LAB.COM>
To:     'Ben Greear' <greearb@...delatech.com>,
        Neal Cardwell <ncardwell@...gle.com>
CC:     netdev <netdev@...r.kernel.org>
Subject: RE: Debugging stuck tcp connection across localhost

From: Ben Greear <greearb@...delatech.com>
> Sent: 10 January 2022 18:10
...
>  From my own looking at things, it seems that the sniffer fails to get frames near when the problem
> starts happening.  I am baffled as to how that can happen, especially since it seems to stop getting
> packets from multiple different TCP connections (the sniffer filter would pick up some other loop-back
> related connections to the same IP port).
> 
> And, if I interpret the ss output properly, after the problem happens, the sockets still think they
> are
> sending data.  I didn't check closely enough to see if the peer side thought it received it.
> 
> We are going to try to reproduce w/out wifi, but not sure we'll have any luck with that.
> We did test w/out VRF (using lots of ip rules instead), and similar problem was seen according to my
> test team (I did not debug it in detail).
> 
> Do you have any suggestions for how to debug this further?  I am happy to hack stuff into the
> kernel if you have some suggested places to add debugging...

Sounds like all transmit traffic on the loopback interface is being discarded
before the point where the frames get fed to tsmdump.

Possibly you could use ftrace to trace function entry+exit of a few
functions that happen in the transmit path and then isolate the point
where the discard is happening.
You can't afford to trace everything - slows things down too much.
But a few traces on each send path should be ok.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)