lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGp9LzpuyXRDa=TxqY+Xd5ZhDVvNayWbpMGDD1T0g7apkn7P0A@mail.gmail.com>
Date: Thu, 15 Jan 2026 18:15:27 -0600
From: Sean Wang <sean.wang@...nel.org>
To: Zac Bowling <zbowling@...il.com>
Cc: deren.wu@...iatek.com, kvalo@...nel.org, linux-kernel@...r.kernel.org, 
	linux-mediatek@...ts.infradead.org, linux-wireless@...r.kernel.org, 
	lorenzo@...nel.org, nbd@....name, ryder.lee@...iatek.com, 
	sean.wang@...iatek.com
Subject: Re: [PATCH v3 00/17] wifi: mt76: mt7925/mt792x: comprehensive
 stability fixes

Hi Zac,

Thanks for sharing this series. Overall the patches look good to me,
and I’m continuing more testing to ensure there are no regressions on
mt7925 and mt7921 further
But today I do hit a kernel WARN in the disconnect path (mac80211 BA
session teardown) while testing v3 of the series

[ 3373.120224] Hardware name: HP HP EliteBook 830 G6/854A, BIOS R70
Ver. 01.22.00 10/14/2022
[ 3373.120228] Workqueue: events_unbound cfg80211_wiphy_work [cfg80211]
[ 3373.120367] RIP: 0010:__ieee80211_stop_tx_ba_session+0x295/0x350 [mac80211]
[ 3373.120570] Code: 11 0f 83 a3 00 00 00 48 c7 80 90 03 00 00 00 00
00 00 48 8b 7d 98 e8 4a 26 f3 fa 4c 89 ee 4c 89 ef e8 6f 16 0b fa 31
c0 eb 93 <0f> 0b 31 c0 eb 8d b8 8e ff ff ff eb 86 48 8b 7d 98 e8 25 26
f3 fa
[ 3373.120576] RSP: 0018:ffffd00902ed7ba0 EFLAGS: 00010206
[ 3373.120583] RAX: 0000000000010003 RBX: 0000000000000003 RCX: 0000000000000000
[ 3373.120587] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 3373.120591] RBP: ffffd00902ed7c10 R08: 0000000000000000 R09: 0000000000000000
[ 3373.120596] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 3373.120599] R13: ffff8a8433717540 R14: ffff8a83e0b20960 R15: ffff8a834d42c000
[ 3373.120604] FS:  0000000000000000(0000) GS:ffff8a8477b03000(0000)
knlGS:0000000000000000
[ 3373.120608] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3373.120626] CR2: 00007b9e0a8ba0d0 CR3: 000000009a440005 CR4: 00000000003726f0
[ 3373.120631] Call Trace:
[ 3373.120656]  <TASK>
[ 3373.120664]  ieee80211_sta_tear_down_BA_sessions+0x53/0xe0 [mac80211]
[ 3373.120836]  __sta_info_destroy_part1+0x48/0x550 [mac80211]
[ 3373.120994]  __sta_info_flush+0x10e/0x230 [mac80211]
[ 3373.121150]  ieee80211_set_disassoc+0x6b3/0x900 [mac80211]
[ 3373.121293]  ? _printk+0x5f/0x90
[ 3373.121330]  __ieee80211_disconnect+0xd6/0x1a0 [mac80211]
[ 3373.121446]  ieee80211_beacon_connection_loss_work+0x6d/0xc0 [mac80211]
[ 3373.121573]  cfg80211_wiphy_work+0xb4/0x190 [cfg80211]
[ 3373.121779]  process_one_work+0x191/0x3e0
[ 3373.121789]  worker_thread+0x2e3/0x420
[ 3373.121796]  ? __pfx_worker_thread+0x10/0x10
[ 3373.121802]  kthread+0x10d/0x230
[ 3373.121810]  ? __pfx_kthread+0x10/0x10
[ 3373.121818]  ret_from_fork+0x205/0x230
[ 3373.121826]  ? __pfx_kthread+0x10/0x10
[ 3373.121832]  ret_from_fork_asm+0x1a/0x30
[ 3373.121842]  </TASK>
[ 3373.121844] ---[ end trace 0000000000000000 ]---
[ 3373.128750] ------------[ cut here ]------------
[ 3373.128757] WARNING: CPU: 1 PID: 14854 at net/mac80211/agg-tx.c:398
__ieee80211_stop_tx_ba_session+0x295/0x350 [mac80211]

I’m currently bisecting the series to identify which patch triggers it
and will follow up once I have clearer results.
Thanks again for the work and the DKMS setup.

                 Sean

On Sun, Jan 4, 2026 at 6:27 PM Zac Bowling <zbowling@...il.com> wrote:
>
> From: Zac Bowling <zac@...bowling.com>
>
> This patch series addresses kernel panics, system deadlocks, and various
> stability issues in the MT7925 WiFi driver. The issues were discovered on
> kernel 6.17 (Ubuntu 25.10) and fixes were developed and tested on 6.18.2.
>
> These patches are based on the wireless tree (nbd168/wireless.git) as
> requested by Sean Wang.
>
> == Problem Description ==
>
> The MT7925 driver has several bugs that cause:
> - Kernel NULL pointer dereferences during BSSID roaming
> - System-wide deadlocks requiring hard reboot
> - Firmware reload failures after suspend/resume
> - Key removal errors during MLO roaming
>
> These issues manifest approximately every 5 minutes when the adapter
> tries to switch to a better BSSID, particularly in enterprise environments
> with multiple access points.
>
> == Root Causes ==
>
> 1. Missing mutex protection around ieee80211_iterate_active_interfaces()
>    when the callback invokes MCU functions (patches 2, 3, 16)
>
> 2. NULL pointer dereferences where mt792x_vif_to_bss_conf(),
>    mt792x_sta_to_link(), and similar functions return NULL during
>    MLO state transitions but results are not checked (patches 1, 4, 5,
>    9, 10, 14, 17)
>
> 3. Ignored MCU return values hiding firmware errors (patches 6, 7, 8)
>
> 4. WARN_ON_ONCE used where NULL is expected during normal MLO AP
>    setup (patch 13)
>
> 5. Firmware semaphore not released after failed load attempts (patch 15)
>
> 6. Key removal returning error when link is already torn down (patch 12)
>
> == Testing ==
>
> Stress tested by hammering the driver with custom test script.
>
> Tested on:
> - Framework Desktop (AMD Ryzen AI Max 300 Series) with MT7925 (RZ717)
> - This whole patch series was tested on Kernel 6.18.2 and 6.17.12 (Ubuntu 25.10)
> - Enterprise WiFi environment with multiple WIFI 7 APs with MLO enabled
>
> Before patches: System hangs/panics every 5-15 minutes during BSSID roaming
> After patches: Stable for 24+ hours under continuous stress testing
>
> == Crash Traces Fixed ==
>
> Primary NULL pointer dereference:
>   BUG: kernel NULL pointer dereference, address: 0000000000000010
>   Workqueue: mt76 mt7925_mac_reset_work [mt7925_common]
>   RIP: 0010:mt76_connac_mcu_uni_add_dev+0x9c/0x780 [mt76_connac_lib]
>   Call Trace:
>    mt7925_vif_connect_iter+0xcb/0x240 [mt7925_common]
>    __iterate_interfaces+0x92/0x130 [mac80211]
>    ieee80211_iterate_interfaces+0x3d/0x60 [mac80211]
>    mt7925_mac_reset_work+0x105/0x190 [mt7925_common]
>
> Deadlock trace:
>   INFO: task kworker/u128:0:48737 blocked for more than 122 seconds.
>   Workqueue: mt76 mt7925_mac_reset_work [mt7925_common]
>   Call Trace:
>    __mutex_lock.constprop.0+0x3d0/0x6d0
>    mt7925_mac_reset_work+0x85/0x170 [mt7925_common]
>
> == Related Links ==
>
> Framework Community discussion:
> https://community.frame.work/t/kernel-panic-from-wifi-mediatek-mt7925-nullptr-dereference/79301
>
> OpenWrt GitHub issues:
> https://github.com/openwrt/mt76/issues/1014
> https://github.com/openwrt/mt76/issues/1036
>
> GitHub repository with additional analysis:
> https://github.com/zbowling/mt7925
>
> Zac Bowling (17):
>   wifi: mt76: mt7925: fix NULL pointer dereference in vif iteration
>   wifi: mt76: mt7925: fix missing mutex protection in reset and ROC abort
>   wifi: mt76: mt7925: fix missing mutex protection in runtime PM and MLO PM
>   wifi: mt76: mt7925: add NULL checks in MCU STA TLV functions
>   wifi: mt76: mt7925: add NULL checks for link_conf and mlink in main.c
>   wifi: mt76: mt7925: add error handling for AMPDU MCU commands
>   wifi: mt76: mt7925: add error handling for BSS info MCU command in sta_add
>   wifi: mt76: mt7925: add error handling for BSS info in key setup
>   wifi: mt76: mt7925: add NULL checks in MLO link and chanctx functions
>   wifi: mt76: mt792x: fix NULL pointer dereference in TX path
>   wifi: mt76: mt7925: add lockdep assertions for mutex verification
>   wifi: mt76: mt7925: fix key removal failure during MLO roaming
>   wifi: mt76: mt7925: fix kernel warning in MLO ROC setup
>   wifi: mt76: mt7925: add NULL checks for MLO link pointers in MCU functions
>   wifi: mt76: mt792x: fix firmware reload failure after previous load crash
>   wifi: mt76: mt7925: add mutex protection in resume path
>   wifi: mt76: mt7925: add NULL checks in link station and TX queue setup
>
>  drivers/net/wireless/mediatek/mt76/mt792x_core.c | 27 +++++++++++++++-
>  drivers/net/wireless/mediatek/mt76/mt7925/mac.c  |  8 +++++
>  drivers/net/wireless/mediatek/mt76/mt7925/main.c | 95 +++++++++++++++++++++---
>  drivers/net/wireless/mediatek/mt76/mt7925/mcu.c  | 52 ++++++++++++++---
>  drivers/net/wireless/mediatek/mt76/mt7925/pci.c  |  6 +++
>  5 files changed, 170 insertions(+), 18 deletions(-)
>
> --
> 2.51.0
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ