lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8d1b7d91-5191-4292-b49b-3d96dc76232e@nbd.name>
Date: Tue, 27 Jan 2026 12:06:07 +0100
From: Felix Fietkau <nbd@....name>
To: Zac <zac@...bowling.com>, sean.wang@...nel.org
Cc: deren.wu@...iatek.com, kvalo@...nel.org, linux-kernel@...r.kernel.org,
 linux-mediatek@...ts.infradead.org, linux-wireless@...r.kernel.org,
 linux@...me.work, lorenzo@...nel.org, ryder.lee@...iatek.com,
 sean.wang@...iatek.com, zbowling@...il.com
Subject: Re: [PATCH 12/13] wifi: mt76: mt7925: fix ROC deadlocks and race
 conditions

On 20.01.26 21:10, Zac wrote:
> From: Zac Bowling <zac@...bowling.com>
> 
> Fix multiple interrelated issues in the remain-on-channel (ROC) handling
> that cause deadlocks, race conditions, and resource leaks.
> 
> Problems fixed:
> 
> 1. Deadlock in sta removal ROC abort path:
>     When a station is removed while a ROC operation is in progress, the
>     driver would call mt7925_roc_abort_sync() which waits for ROC completion.
>     However, the ROC work itself needs to acquire mt792x_mutex which is
>     already held during station removal, causing a deadlock.
> 
>     Fix: Use async ROC abort (mt76_connac_mcu_abort_roc) when called from
>     paths that already hold the mutex, and add MT76_STATE_ROC_ABORT flag
>     to coordinate between the abort and the ROC timer.
> 
> 2. ROC timer race during suspend:
>     The ROC timer could fire after the device started suspending but before
>     the ROC was properly aborted, causing undefined behavior.
> 
>     Fix: Delete ROC timer synchronously before suspend and check device
>     state before processing ROC timeout.
> 
> 3. ROC rate limiting for MLO auth failures:
>     Rapid ROC requests during MLO authentication can overwhelm the firmware,
>     causing authentication timeouts. The MT7925 firmware has limited ROC
>     handling capacity.
> 
>     Fix: Add rate limiting infrastructure with configurable minimum interval
>     between ROC requests. Track last ROC completion time and defer new
>     requests if they arrive too quickly.
> 
> 4. WCID leak in ROC cleanup:
>     When ROC operations are aborted, the associated WCID resources were
>     not being properly released, causing resource exhaustion over time.
> 
>     Fix: Ensure WCID cleanup happens in all ROC termination paths.
> 
> 5. Async ROC abort race condition:
>     The async ROC abort could race with normal ROC completion, causing
>     double-free or use-after-free of ROC resources.
> 
>     Fix: Use MT76_STATE_ROC_ABORT flag and proper synchronization to
>     prevent races between async abort and normal completion paths.
> 
> These fixes work together to provide robust ROC handling that doesn't
> deadlock, properly releases resources, and handles edge cases during
> suspend and MLO operations.
> 
> Fixes: c948b5da6bbe ("wifi: mt76: mt7925: add Mediatek Wi-Fi7 driver for mt7925 device")
> Signed-off-by: Zac Bowling <zac@...bowling.com>

The rate limiting code seems a bit suspicious to me.
What does "limited ROC handling capacity" mean? Outstanding ROC 
requests? Does it need time to settle after a completed ROC?
This needs to be clarified and likely replaced with a more targeted fix.

- Felix

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ