lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <8c2f274b-cf05-4fad-b9d6-fa9de1363d42@gmail.com>
Date: Tue, 26 Nov 2024 04:59:35 -0800
From: James Prestwood <prestwoj@...il.com>
To: Remi Pommarel <repk@...plefau.lt>, ath10k@...ts.infradead.org,
 linux-wireless@...r.kernel.org, linux-kernel@...r.kernel.org
Cc: Kalle Valo <kvalo@...nel.org>, Jeff Johnson <jjohnson@...nel.org>,
 Cedric Veilleux <veilleux.cedric@...il.com>,
 Vasanthakumar Thiagarajan <quic_vthiagar@...cinc.com>
Subject: Re: [RESEND PATCH v3 0/2] Improve ath10k flush queue mechanism


On 11/26/24 4:57 AM, James Prestwood wrote:
> Hi Remi,
>
> On 11/22/24 8:48 AM, Remi Pommarel wrote:
>> It has been reported [0] that a 3-4 seconds (actually up to 5 sec) of
>> radio silence could be observed followed by the error below on ath10k
>> devices:
>>
>>   ath10k_pci 0000:04:00.0: failed to flush transmit queue (skip 0 
>> ar-state 1): 0
>>
>> This is due to how the TX queues are flushed in ath10k. When a STA is
>> removed, mac80211 need to flush queues [1], but because ath10k does not
>> have a lightweight .flush_sta operation, ieee80211_flush_queues() is
>> called instead effectively blocking the whole queue during the drain
>> causing this radio silence. Also because ath10k_flush() waits for all
>> queued to be emptied, not only the flushed ones it could more easily
>> take up to 5 seconds to finish making the whole situation worst.
>>
>> The first patch of this series adds a .flush_sta operation to flush only
>> specific STA traffic avoiding the need to stop whole queues and should
>> be enough in itself to fix the reported issue.
>>
>> The second patch of this series is a proposal to improve ath10k_flush so
>> that it will be less likely to timeout waiting for non related queues to
>> drain.
>>
>> The abose kernel warning could still be observed (e.g. flushing a dead
>> STA) but should be now harmless.
>>
>> [0]: 
>> https://lore.kernel.org/all/CA+Xfe4FjUmzM5mvPxGbpJsF3SvSdE5_wgxvgFJ0bsdrKODVXCQ@mail.gmail.com/
>> [1]: commit 0b75a1b1e42e ("wifi: mac80211: flush queues on STA removal")
>
> I saw in the original report that it indicated it was only for AP mode 
> but after seeing this and checking some of our clients I saw that this 
> is also happening in station mode too. I only have clients on 6.2 and 
> 6.8. I can confirm its not occurring on 6.2, but is on 6.8. I also 
> tried your set of patches but did not notice any behavior difference 
> with or without them. When it happens, its always just after a roam 
> scan, ~4 seconds go by and we get the failure followed by a 
> "Connection to AP <mac> lost". Oddly the MAC address is all zeros.
>
> Nov 25 09:09:50 iwd[16256]: src/station.c:station_start_roam() Using 
> cached neighbor report for roam
> Nov 25 09:09:54 kernel: ath10k_pci 0000:02:00.0: failed to flush 
> transmit queue (skip 0 ar-state 1): 0
> Nov 25 09:09:54 iwd[16256]: src/netdev.c:netdev_mlme_notify() MLME 
> notification Del Station(20)
> Nov 25 09:09:54 iwd[16256]: src/netdev.c:netdev_link_notify() event 16 
> on ifindex 7
> Nov 25 09:09:54 iwd[16256]: src/netdev.c:netdev_mlme_notify() MLME 
> notification Deauthenticate(39)
> Nov 25 09:09:54 iwd[16256]: src/netdev.c:netdev_deauthenticate_event()
> Nov 25 09:09:54 iwd[16256]: src/netdev.c:netdev_mlme_notify() MLME 
> notification Disconnect(48)
> Nov 25 09:09:54 iwd[16256]: src/netdev.c:netdev_disconnect_event()
> Nov 25 09:09:54 iwd[16256]: Received Deauthentication event, reason: 
> 4, from_ap: false
> Nov 25 09:09:54 kernel: wlan0: Connection to AP 00:00:00:00:00:00 lost
>
> Other times, the above logs are preceded by this:
>
> Nov 26 00:25:25 kernel: ath10k_pci 0000:02:00.0: failed to flush sta 
> txq (sta ca:55:b8:7a:91:4b skip 0 ar-state 1): 0
>
> Note, the above logs are with your patches applied. Maybe this is a 
> separate issue? Or do you think its related?

Forgot to mention, this is on the QCA6174 hw 3.2

firmware ver WLAN.RM.4.4.1-00288- api 6 features wowlan,ignore-otp,mfp 
crc32 bf907c7c

>
> Thanks,
>
> James
>
>>
>> V3:
>>    - Initialize empty to true to fix smatch error
>>
>> V2:
>>    - Add Closes tag
>>    - Use atomic instead of spinlock for per sta pending frame counter
>>    - Call ath10k_htt_tx_sta_dec_pending within rcu
>>    - Rename pending_per_queue[] to num_pending_per_queue[]
>>
>> Remi Pommarel (2):
>>    wifi: ath10k: Implement ieee80211 flush_sta callback
>>    wifi: ath10k: Flush only requested txq in ath10k_flush()
>>
>>   drivers/net/wireless/ath/ath10k/core.h   |  2 +
>>   drivers/net/wireless/ath/ath10k/htt.h    | 11 +++-
>>   drivers/net/wireless/ath/ath10k/htt_tx.c | 49 +++++++++++++++-
>>   drivers/net/wireless/ath/ath10k/mac.c    | 75 ++++++++++++++++++++----
>>   drivers/net/wireless/ath/ath10k/txrx.c   | 11 ++--
>>   5 files changed, 127 insertions(+), 21 deletions(-)
>>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ