lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 2 Jan 2020 23:21:56 -0800
From:   Justin Capella <justincapella@...il.com>
To:     Stephen Oberholtzer <stevie@...ff.net>
Cc:     Johannes Berg <johannes@...solutions.net>,
        Toke Høiland-Jørgensen <toke@...hat.com>,
        "David S. Miller" <davem@...emloft.net>,
        linux-wireless@...r.kernel.org, netdev@...r.kernel.org,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: PROBLEM: Wireless networking goes down on Acer C720P Chromebook (bisected)

The rather large negative deficit stands out to me. See this patch,
https://patchwork.kernel.org/patch/11246363/ specifically the comments
by Kan Yan

On Thu, Jan 2, 2020, 3:14 PM Stephen Oberholtzer <stevie@...ff.net> wrote:
>
>
> /sys/kernel/debug/ieee80211/phy0
>
> airtime_flags = 7
>
> stations/<my AP's MAC>/airtime =
>
> RX: 6583578 us
> TX: 32719 us
> Weight: 256
> Deficit: VO: -1128 us VI: 11 us BE: -5098636 us BK: 256 us
> Q depth: VO: 3868 us VI: 3636 us BE: 12284 us BK: 0 us
> Q limit[low/high]: VO: 5000/12000 VI: 5000/12000 BE: 5000/12000 BK: 5000/1200

On Thu, Jan 2, 2020 at 3:14 PM Stephen Oberholtzer <stevie@...ff.net> wrote:
>
> On Thu, Jan 2, 2020 at 8:28 AM Johannes Berg <johannes@...solutions.net> wrote:
> >
> > On Tue, 2019-12-31 at 19:49 -0500, Stephen Oberholtzer wrote:
> > > Wireless networking goes down on Acer C720P Chromebook (bisected)
> > >
> > > Culprit: 7a89233a ("mac80211: Use Airtime-based Queue Limits (AQL) on
> > > packet dequeue")
> > >
>
> <snip>
>
> > I think I found at least one hole in this, but IIRC (it was before my
> > vacation, sorry) it was pretty unlikely to actually happen. Perhaps
> > there are more though.
> >
> > https://lore.kernel.org/r/b14519e81b6d2335bd0cb7dcf074f0d1a4eec707.camel@sipsolutions.net
>
> <snippety-snip>
>
> > Do you get any output at all? Like a WARN_ON() for an underflow, or
> > something?
> >
> > johannes
> >
>
> Johannes,
>
> To answer your immediate question, no, I don't get any dmesg output at
> all. Nothing about underruns.
> However, while pursuing other avenues -- specifically, enabling
> mac80211 debugfs and log messages -- I realized that my 'master' was
> out-of-date from linux-stable and did a git pull.  Imagine my surprise
> when the resulting kernel did not exhibit the problem!
>
> Apparently, I had been a bit too pessimistic; since the problem
> existed in 5.5-rc1 release, I'd assumed that the problem wouldn't get
> rectified before 5.5.
>
> However, I decided to bisect the fix, and ended up with: 911bde0f
> ("mac80211: Turn AQL into an NL80211_EXT_FEATURE"), which appears to
> have "solved" the problem by just disabling the feature (this is
> ath9k, by the way.)
>
> This AQL stuff sounds pretty nifty, and I'd love to try my hand at
> making it work for ath9k (also, since I put so much effort into an
> automated build-and-test framework, it'd be a shame to just abandon
> it.)  However, the ath9k code is rather lacking for comments, so I
> don't even know where I should start, except for (I suspect) a call to
> `wiphy_ext_feature_set(whatever, NL80211_EXT_FEATURE_AQL);` from
> inside ath9k_set_hw_capab()?
>
> In the meantime, I went back to e548f749b096 -- the commit prior to
> the one making AQL support opt-in -- and cranked up the debugging.
>
> I'm not sure how to interpret any of this, but  here's what I got:
>
> dmesg output:
>
> Last relevant mention is "moving STA <my AP's MAC> to state 4" which
> happened during startup, before everything shut down.
>
> /sys/kernel/debug/ieee80211/phy0
>
> airtime_flags = 7
>
> stations/<my AP's MAC>/airtime =
>
> RX: 6583578 us
> TX: 32719 us
> Weight: 256
> Deficit: VO: -1128 us VI: 11 us BE: -5098636 us BK: 256 us
> Q depth: VO: 3868 us VI: 3636 us BE: 12284 us BK: 0 us
> Q limit[low/high]: VO: 5000/12000 VI: 5000/12000 BE: 5000/12000 BK: 5000/12000
>
> (I have no idea how to interpret this, but that '32719 us' seems odd,
> I thought the airtime usage was in 4us units?)
>
>
> Doing an 'echo 3 | tee airtime_flags' to clear the (old) AQL-enabled
> bit seemed to *immediately* restore network connectivity.
>
> I ran a ping, and saw this:
>
> - pings coming back in <5ms
> - re-enable AQL (echo 7 | tee airtime_flags)
> - pings stop coming back immediately
> - some seconds later, disable AQL again (echo 3 | tee airtime_flags)
> - immediate *flood* of ping replies registered, with times 16000ms,
> 15000ms, 14000ms, .. down to 1000ms, 15ms, then stabilizing sub-5ms
> - According to the icmp_seq values, all 28 requests were replied to,
> and their replies were delivered in-order
>
> This certainly looks like a missing TX queue restart to me?
>
>
> --
> -- Stevie-O
> Real programmers use COPY CON PROGRAM.EXE

Powered by blists - more mailing lists