lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 12 Dec 2019 13:58:09 -0800
From:   Ben Greear <greearb@...delatech.com>
To:     Johannes Berg <johannes@...solutions.net>,
        Eric Dumazet <eric.dumazet@...il.com>,
        Neal Cardwell <ncardwell@...gle.com>
Cc:     Toke Høiland-Jørgensen <toke@...hat.com>,
        linux-wireless@...r.kernel.org, Netdev <netdev@...r.kernel.org>
Subject: Re: debugging TCP stalls on high-speed wifi

On 12/12/19 1:46 PM, Johannes Berg wrote:
> On Thu, 2019-12-12 at 13:29 -0800, Ben Greear wrote:
>>
>>> (*) Hmm. Now I have another idea. Maybe we have some kind of problem
>>> with the medium access configuration, and we transmit all this data
>>> without the AP having a chance to send back all the ACKs? Too bad I
>>> can't put an air sniffer into the setup - it's a conductive setup.
>>
>> splitter/combiner?
> 
> I guess. I haven't looked at it, it's halfway around the world or
> something :)
> 
>> If it is just delayed acks coming back, which would slow down a stream, then
>> multiple streams would tend to work around that problem?
> 
> Only a bit, because it allows somewhat more outstanding data. But each
> stream estimates the throughput lower in its congestion control
> algorithm, so it would have a smaller window size?
> 
> What I was thinking is that if we have some kind of skew in the system
> and always/frequently/sometimes make our transmissions have priority
> over the AP transmissions, then we'd not get ACKs back, and that might
> cause what I see - the queue drains entirely and *then* we get an ACK
> back...
> 
> That's not a _bad_ theory and I'll have to find a good way to test it,
> but I'm not entirely convinced that's the problem.
> 
> Oh, actually, I guess I know it's *not* the problem because otherwise
> the ss output would show we're blocked on congestion window far more
> than it looks like now? I think?

If you get the rough packet counters or characteristics, you could set up UDP flows to mimic
download and upload packet behaviour and run them concurrently.  If you can still push a good bit more UDP up even
with small UDP packets emulating TCP acks coming down, then I think you can be
confident that it is not ACKs clogging up the RF or AP being starved for airtime.

Since the windows driver works better, then probably it is not much to do with ACKs or
downstream traffic anyway.

>> 		TCP_TSQ=200
> 
> Setting it to 200 is way excessive. In particular since you already get
> the *8 from the default mac80211 behaviour, so now you effectively have
> *1600, which means instead of 1ms you can have 1.6s worth of TCP data on
> the queues ... way too much :)

Yes, this was hacked in back when the pacing did not work well with ath10k.
I'll do some tests to see how much this matters on modern kernels when I get
a chance.

This will allow huge congestion control windows....

Thanks,
Ben


-- 
Ben Greear <greearb@...delatech.com>
Candela Technologies Inc  http://www.candelatech.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ