lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 28 May 2024 09:50:25 +0100
From: "Russell King (Oracle)" <linux@...linux.org.uk>
To: Kalle Valo <kvalo@...nel.org>
Cc: Johannes Berg <johannes.berg@...el.com>,
	Michael Nemanov <michael.nemanov@...com>,
	linux-kernel@...r.kernel.org, linux-wireless@...r.kernel.org
Subject: Re: [PATCH wireless] wifi: wlcore: fix wlcore AP mode

On Tue, May 28, 2024 at 09:36:43AM +0100, Russell King wrote:
> From: Johannes Berg <johannes.berg@...el.com>
> 
> Using wl183x devices in AP mode with various firmwares is not stable.
> 
> The driver currently adds a station to firmware with basic rates when it
> is first known to the stack using the CMD_ADD_PEER command. Once the
> station has finished authorising, another CMD_ADD_PEER command is issued
> to update the firmware with the rates the station can use.
> 
> However, after a random amount of time, the firmware ignores the power
> management nullfunc frames from the station, and tries to send packets
> while the station is asleep, resulting in lots of retries dropping down
> in rate due to no response. This restricts the available bandwidth.
> 
> With this happening with several stations, the user visible effect is
> the latency of interactive connections increases significantly, packets
> get dropped, and in general the WiFi connections become unreliable and
> unstable.
> 
> Eventually, the firmware transmit queue appears to get stuck - with
> packets and blocks allocated that never clear.
> 
> TI have a couple of patches that address this, but they touch the
> mac80211 core to disable NL80211_FEATURE_FULL_AP_CLIENT_STATE for *all*
> wireless drivers, which has the effect of not adding the station to the
> stack until later when the rates are known. This is a sledge hammer
> approach to solving the problem.
> 
> The solution implemented here has the same effect, but without
> impacting all drivers.
> 
> We delay adding the station to firmware until it has been authorised
> in the driver, and correspondingly remove the station when unwinding
> from authorised state. Adding the station to firmware allocates a hlid,
> which will now happen later than the driver expects. Therefore, we need
> to track when this happens so that we transmit using the correct hlid.
> 
> This patch is an equivalent fix to these two patches in TI's
> wilink8-wlan repository:
> 
> https://git.ti.com/cgit/wilink8-wlan/build-utilites/tree/patches/kernel_patches/4.19.38/0004-mac80211-patch.patch?h=r8.9&id=a2ee50aa5190ed3b334373d6cd09b1bff56ffcf7
> https://git.ti.com/cgit/wilink8-wlan/build-utilites/tree/patches/kernel_patches/4.19.38/0005-wlcore-patch.patch?h=r8.9&id=a2ee50aa5190ed3b334373d6cd09b1bff56ffcf7
> 
> Reported-by: Russell King (Oracle) <rmk+kernel@...linux.org.uk>
> Co-developed-by: Russell King (Oracle) <rmk+kernel@...linux.org.uk>
> Tested-by: Russell King (Oracle) <rmk+kernel@...linux.org.uk>
> Signed-off-by: Johannes Berg <johannes.berg@...el.com>"
> Signed-off-by: Russell King (Oracle) <rmk+kernel@...linux.org.uk>

Please note that this patch fixes just one of the issues with the
driver. There remains other firmware bugs that make AP mode
unreliable. For example:

When a station, e.g. a phone, moves out of range of the AP, and the
station is in power saving mode, packets become stuck in the transmit
queue. With additional debugging added to the driver:

Unable to flush all frames for station xx:xx:xx:ee:11:fe for hlid 3
FW time: 1675524181
 Frame 0: expires 1394140264 MAC xx:xx:xx:ee:11:fe FC 17032
 Frame 1: expires 1394264633 MAC xx:xx:xx:ee:11:fe FC 17032

These packets get removed by the firmware when the peer is removed.
However, if the broadcast hlid was in power saving at the time, then
it appears the broadcast hlid gets similarly stuck, leading to the
entire network eventually falling over due to the AP effectively
blocking broadcasted ARP requests.

I can find no way around this - and I suspect there is some kind of
refcounting bug in the firmware when told to remove a peer which has
queued packets.

My best workaround for this at the moment is to monitor the state of
the driver via debugfs, and when this problem presents, to take the
AP down and bring it back up, restarting the firmware (but has the
effect of kicking all connected devices off the network.)

Another workaround for is to turn wifi off on the phone before moving
it out of range!

I will attempt to get captures of the network at some point - both
from the packets at the AP network interface, but also the radio
side as well.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ