[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8d1b7d91-5191-4292-b49b-3d96dc76232e@nbd.name>
Date: Tue, 27 Jan 2026 12:06:07 +0100
From: Felix Fietkau <nbd@....name>
To: Zac <zac@...bowling.com>, sean.wang@...nel.org
Cc: deren.wu@...iatek.com, kvalo@...nel.org, linux-kernel@...r.kernel.org,
linux-mediatek@...ts.infradead.org, linux-wireless@...r.kernel.org,
linux@...me.work, lorenzo@...nel.org, ryder.lee@...iatek.com,
sean.wang@...iatek.com, zbowling@...il.com
Subject: Re: [PATCH 12/13] wifi: mt76: mt7925: fix ROC deadlocks and race
conditions
On 20.01.26 21:10, Zac wrote:
> From: Zac Bowling <zac@...bowling.com>
>
> Fix multiple interrelated issues in the remain-on-channel (ROC) handling
> that cause deadlocks, race conditions, and resource leaks.
>
> Problems fixed:
>
> 1. Deadlock in sta removal ROC abort path:
> When a station is removed while a ROC operation is in progress, the
> driver would call mt7925_roc_abort_sync() which waits for ROC completion.
> However, the ROC work itself needs to acquire mt792x_mutex which is
> already held during station removal, causing a deadlock.
>
> Fix: Use async ROC abort (mt76_connac_mcu_abort_roc) when called from
> paths that already hold the mutex, and add MT76_STATE_ROC_ABORT flag
> to coordinate between the abort and the ROC timer.
>
> 2. ROC timer race during suspend:
> The ROC timer could fire after the device started suspending but before
> the ROC was properly aborted, causing undefined behavior.
>
> Fix: Delete ROC timer synchronously before suspend and check device
> state before processing ROC timeout.
>
> 3. ROC rate limiting for MLO auth failures:
> Rapid ROC requests during MLO authentication can overwhelm the firmware,
> causing authentication timeouts. The MT7925 firmware has limited ROC
> handling capacity.
>
> Fix: Add rate limiting infrastructure with configurable minimum interval
> between ROC requests. Track last ROC completion time and defer new
> requests if they arrive too quickly.
>
> 4. WCID leak in ROC cleanup:
> When ROC operations are aborted, the associated WCID resources were
> not being properly released, causing resource exhaustion over time.
>
> Fix: Ensure WCID cleanup happens in all ROC termination paths.
>
> 5. Async ROC abort race condition:
> The async ROC abort could race with normal ROC completion, causing
> double-free or use-after-free of ROC resources.
>
> Fix: Use MT76_STATE_ROC_ABORT flag and proper synchronization to
> prevent races between async abort and normal completion paths.
>
> These fixes work together to provide robust ROC handling that doesn't
> deadlock, properly releases resources, and handles edge cases during
> suspend and MLO operations.
>
> Fixes: c948b5da6bbe ("wifi: mt76: mt7925: add Mediatek Wi-Fi7 driver for mt7925 device")
> Signed-off-by: Zac Bowling <zac@...bowling.com>
The rate limiting code seems a bit suspicious to me.
What does "limited ROC handling capacity" mean? Outstanding ROC
requests? Does it need time to settle after a completed ROC?
This needs to be clarified and likely replaced with a more targeted fix.
- Felix
Powered by blists - more mailing lists