lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230308112650.3c72d6f2@polar.lan>
Date:   Wed, 8 Mar 2023 11:26:50 +0100
From:   Thomas Mann <rauchwolke@....net>
To:     Alexander Wetzel <alexander@...zel-home.de>
Cc:     Linux regressions mailing list <regressions@...ts.linux.dev>,
        "linux-wireless@...r.kernel.org" <linux-wireless@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Stanislaw Gruszka <stf_xl@...pl>,
        Helmut Schaa <helmut.schaa@...glemail.com>,
        Johannes Berg <johannes.berg@...el.com>
Subject: Re: [Regression] rt2800usb - Wifi performance issues and connection
 drops

On Wed, 8 Mar 2023 08:13:32 +0100
Alexander Wetzel <alexander@...zel-home.de> wrote:

> On 07.03.23 23:31, Thomas Mann wrote:
> > Hi Alexander,
>
> Since I suspect we'll exchange quite some mails here:
> Top posting is being frowned on the mailing lists on copy.
> Details here: https://www.infradead.org/~dwmw2/email.html
>
> I've moved your post to the correct position and replied there.
>
> >
> >>>>
> >>>
> >>> I just uploaded a test patch to bugzilla.
> >>> Please have a look if that fixes the issue.
> >>>
> >>> If not I would be interested in the output of your iTXQ status.
> >>> Enable CONFIG_MAC80211_DEBUGFS and run this command when the
> >>> connection is bad and send/share/upload to bugzilla the resulting
> >>> debug.out:
> >>>
> >>> k=1; while [ $k -lt 10 ]; do \
> >>> cat /sys/kernel/debug/ieee80211/phy?/netdev:*/stations/*/aqm; \
> >>> k=$(($k+1)); done >> debug.out
> >>
> >> Thomas and I continued with some debugging in
> >> https://bugzilla.kernel.org/show_bug.cgi?id=217119
> >>
> >> But the results so far are unexpected and we decided to continue
> >> the debugging with the round here. Hoping someone sees something I
> >> miss.
> >>
> >> A very summary where we are:
> >> I can't reproduce the bug with a very similar card and kernel
> >> config so far. Thomas card stops the iTXQs for intervalls >30s.
> >> Mine operates normally.
> >>
> >> A more useful but longer summary:
> >>
> >> Thomas updated to a 6.2 kernel and reported "connection drops and
> >> bandwidth problems" with his rt2800usb wlan card. (6.1 is ok.)
> >> Asked for some more details he reported:
> >> "...slow bandwidth stuff works better, but the main problem/test
> >> case is to start a 8-16 mbit video stream, which sometimes runs
> >> for a few seconds and then stops or it doesn't start at all"
> >>
> >> He bisected the issue and identified my commit 4444bc2116ae ("wifi:
> >> mac80211: Proper mark iTXQs for resumption") as culprit.
> >>
> >> Checking the internal iTXQ status when the issue is ongoing shows,
> >> that TID zero is flagged as dirty and thus is not transmitting
> >> queued packets. Interesting line from
> >> /sys/kernel/debug/ieee80211/phy?/netdev:*/stations/*/aqm:
> >> tid ac backlog-bytes backlog-packets new-flows drops marks
> >> overlimit collisions tx-bytes tx-packets flags
> >> 0 2 619736 404 1681 0 0 0 1 4513965 3019 0xe(RUN AMPDU NO-AMSDU
> >> DIRTY)
> >> --> The "normal" iTXQ handling IEEE80211_AC_BE has queued packets
> >> and is flagged as DIRTY. There even is a potential race setting
> >> the DIRTY flag, but the fix for that is not helping.
> >>
> >> Thus Thomas applied two debug patches, to better understand why the
> >> DIRTY flag is not cleared.
> >>
> >> And looking at the output from those we see that the driver stops
> >> Tx by calling ieee80211_stop_queue(). When ieee80211_wake_queue()
> >> mac80211 correctly resumes TX but is getting stopped by the driver
> >> after a single packet again. (The start of the relevant log is
> >> missing, so that may be initially more).
> >> I assume TX is still ok at that stage. But after some singe Tx
> >> operations the driver stops the queues again. Here the relevant
> >> part of the log:
> >> [  179.584997] XXXX __ieee80211_wake_txqs: waking TID 0
> >> [  179.585022] XXXX drv_tx: TX
> >> [  179.585027] XXXX ieee80211_stop_queue: called
> >> [  179.585028] XXXX ieee80211_tx_dequeue: mark TID 0 dirty.
> >> Reason: 1 [  179.585030] XXXX __ieee80211_wake_txqs: TID 3 NOT
> >> dirty [  179.585031] XXXX __ieee80211_wake_txqs: TID 8 NOT dirty
> >> [  179.585033] XXXX __ieee80211_wake_txqs: TID 11 NOT dirty
> >> [  179.585034] XXXX __ieee80211_wake_txqs: EXIT
> >> [  179.585035] XXXX __ieee80211_wake_txqs: ENTRY
> >> [  179.585036] XXXX __ieee80211_wake_txqs: TID 1 NOT dirty
> >> [  179.585037] XXXX __ieee80211_wake_txqs: TID 2 NOT dirty
> >> [  179.585038] XXXX __ieee80211_wake_txqs: TID 9 NOT dirty
> >> [  179.585040] XXXX __ieee80211_wake_txqs: TID 10 NOT dirty
> >> [  179.585041] XXXX __ieee80211_wake_txqs: EXIT
> >> [  179.585047] XXXX drv_tx: TX
> >> [  179.585056] XXXX ieee80211_tx_dequeue: mark TID 0 dirty.
> >> Reason: 1 [  179.585271] XXXX ieee80211_tx_dequeue: mark TID 0
> >> dirty. Reason: 1 [  179.585868] XXXX ieee80211_tx_dequeue: mark
> >> TID 0 dirty. Reason: 1 [  179.586120] XXXX ieee80211_tx_dequeue:
> >> mark TID 0 dirty. Reason: 1 [  179.586544] XXXX
> >> ieee80211_tx_dequeue: mark TID 0 dirty. Reason: 1 [  179.586792]
> >> XXXX ieee80211_tx_dequeue: mark TID 0 dirty. Reason: 1 [
> >> 179.587317] XXXX ieee80211_tx_dequeue: mark TID 0 dirty. Reason: 1
> >> [  179.587591] XXXX ieee80211_tx_dequeue: mark TID 0 dirty.
> >> Reason: 1 [  179.588569] XXXX ieee80211_tx_dequeue: mark TID 0
> >> dirty. Reason: 1 .... [  214.307617] XXXX ieee80211_wake_queue:
> >> called
> >>
> >>
> >> --> So the driver blocked TX for more than 30s. Which is a good
> >> explanation of what Thomas observes.
> >>
> >> But there is nothing mac80211 can do differently here. Whatever is
> >> the real reason for the issue, it's nothing obvious I see.
> >>
> >> Luckily I found a card using the same driver and nearly the same
> >> card: Thomas systems:Linux version 6.2.2-gentoo (root@foo) (gcc
> >> (Gentoo Hardened 12.2.1_p20230121-r1 p10) 12.2.1 20230121, GNU ld
> >> (Gentoo 2.39 p5) 2.39.0) #2 SMP Fri Mar  3 16:59:02 CET
> >> 2023ieee80211 phy0: rt2x00_set_rt: Info - RT chipset 3070, rev
> >> 0201 detected ieee80211 phy0: rt2x00_set_rf: Info - RF chipset
> >> 0005 detected ieee80211 phy0: Selected rate control algorithm
> >> 'minstrel_ht'
> >>
> >> My system, using the kernel config from Thomas with only minor
> >> modifications (different filesystems and initramfs settings and
> >> enabled mac80211 debug and developer options):
> >> Linux version 6.2.2-gentoo (root@...ry.mordor) (gcc (Gentoo
> >> 12.2.1_p20230121-r1 p10) 12.2.1 20230121, GNU ld (Gentoo 2.40 p2)
> >> 2.40.0) #2 SMP Tue Mar  7 18:18:47 CET 2023ieee80211 phy0:
> >> rt2x00_set_rt: Info - RT chipset 3070, rev 0200 detected
> >> ieee80211 phy0: rt2x00_set_rf: Info - RF chipset 0005 detected
> >> ieee80211 phy0: Selected rate control algorithm 'minstrel_ht'
> >> ieee80211 phy0: rt2x00lib_request_firmware: Info - Loading firmware
> >> file 'rt2870.bin'
> >> ieee80211 phy0: rt2x00lib_request_firmware: Info - Firmware
> >> detected
> >> - version: 0.36
> >>
> >> But there is one big difference on my system: I can't reproduce the
> >> bug so far. It's working as it should... (I did not apply the debug
> >> patches myself so far)
> >>
> >> I'm now planning to look a bit more into the rt2800usb driver and
> >> provide another debug patch for interesting looking code pieces in
> >> it.
> >>
> >> @Thomas:
> >> I've also uploaded you my binary kernel I'm running at the moment
> >> here: https://www.awhome.eu/s/5FjqMS73rtCtSBM
> >>
> >> That kernel should also be able to boot and operate your system.
> >> Can you try that and tell me, if that makes any difference?
>
>  >
>  > i can't boot the binary kernel here, as the initramfs is included
>  > in my kernel, if you send me a patch, i can apply it and test it.
>
> That was an unpatched kernel. Idea was to verify that it's not a
> compiler issue. (You seem to be using a hardened Gentoo profile.)
>
> Can you share your initrd, so I can include it? (Mail it to me
> directly, upload it to bug in buguilla or send a link to some cloud
> storage.)
>
I can't share this config, as it's a production system, and i'm not
allowed to run abitrary binary code on the system. As 6.1.x works
without a problem, i don't think it's a compiler problem. I will try to
get a none hardened compiler and recompile the kernel.

>
>
> >>
> >> I'm also planning to provide some more debug patches, to figuring
> >> out which part of commit 4444bc2116ae ("wifi: mac80211: Proper
> >> mark iTXQs for resumption") fixes the issue for you. Assuming my
> >> understanding above is correct the patch should not really
> >> fix/break anything for you...With the findings above I would have
> >> expected your git bisec to identify commit a790cc3a4fad ("wifi:
> >> mac80211: add wake_tx_queue callback to drivers") as the first
> >> broken commit...
> >>
> >> Alexander
> >
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ