netdev - Re: debugging TCP stalls on high-speed wifi

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Thu, 12 Dec 2019 13:58:09 -0800
From:   Ben Greear <greearb@...delatech.com>
To:     Johannes Berg <johannes@...solutions.net>,
        Eric Dumazet <eric.dumazet@...il.com>,
        Neal Cardwell <ncardwell@...gle.com>
Cc:     Toke Høiland-Jørgensen <toke@...hat.com>,
        linux-wireless@...r.kernel.org, Netdev <netdev@...r.kernel.org>
Subject: Re: debugging TCP stalls on high-speed wifi

On 12/12/19 1:46 PM, Johannes Berg wrote:
> On Thu, 2019-12-12 at 13:29 -0800, Ben Greear wrote:
>>
>>> (*) Hmm. Now I have another idea. Maybe we have some kind of problem
>>> with the medium access configuration, and we transmit all this data
>>> without the AP having a chance to send back all the ACKs? Too bad I
>>> can't put an air sniffer into the setup - it's a conductive setup.
>>
>> splitter/combiner?
> 
> I guess. I haven't looked at it, it's halfway around the world or
> something :)
> 
>> If it is just delayed acks coming back, which would slow down a stream, then
>> multiple streams would tend to work around that problem?
> 
> Only a bit, because it allows somewhat more outstanding data. But each
> stream estimates the throughput lower in its congestion control
> algorithm, so it would have a smaller window size?
> 
> What I was thinking is that if we have some kind of skew in the system
> and always/frequently/sometimes make our transmissions have priority
> over the AP transmissions, then we'd not get ACKs back, and that might
> cause what I see - the queue drains entirely and *then* we get an ACK
> back...
> 
> That's not a _bad_ theory and I'll have to find a good way to test it,
> but I'm not entirely convinced that's the problem.
> 
> Oh, actually, I guess I know it's *not* the problem because otherwise
> the ss output would show we're blocked on congestion window far more
> than it looks like now? I think?

If you get the rough packet counters or characteristics, you could set up UDP flows to mimic
download and upload packet behaviour and run them concurrently.  If you can still push a good bit more UDP up even
with small UDP packets emulating TCP acks coming down, then I think you can be
confident that it is not ACKs clogging up the RF or AP being starved for airtime.

Since the windows driver works better, then probably it is not much to do with ACKs or
downstream traffic anyway.

>> 		TCP_TSQ=200
> 
> Setting it to 200 is way excessive. In particular since you already get
> the *8 from the default mac80211 behaviour, so now you effectively have
> *1600, which means instead of 1ms you can have 1.6s worth of TCP data on
> the queues ... way too much :)

Yes, this was hacked in back when the pacing did not work well with ath10k.
I'll do some tests to see how much this matters on modern kernels when I get
a chance.

This will allow huge congestion control windows....

Thanks,
Ben


-- 
Ben Greear <greearb@...delatech.com>
Candela Technologies Inc  http://www.candelatech.com