lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b8e11bbc-c718-4acf-acc0-6b31f25fae27@nbd.name>
Date: Tue, 17 Sep 2024 11:15:38 +0200
From: Felix Fietkau <nbd@....name>
To: Kalle Valo <kvalo@...nel.org>, Lorenzo Bianconi <lorenzo@...nel.org>
Cc: Alper Nebi Yasak <alpernebiyasak@...il.com>,
 linux-mediatek@...ts.infradead.org, linux-wireless@...r.kernel.org,
 Ryder Lee <ryder.lee@...iatek.com>, Shayne Chen <shayne.chen@...iatek.com>,
 Sean Wang <sean.wang@...iatek.com>, Matthias Brugger
 <matthias.bgg@...il.com>,
 AngeloGioacchino Del Regno <angelogioacchino.delregno@...labora.com>,
 Ming Yen Hsieh <mingyen.hsieh@...iatek.com>, Deren Wu
 <deren.wu@...iatek.com>, linux-kernel@...r.kernel.org,
 linux-arm-kernel@...ts.infradead.org, Ma Ke <make24@...as.ac.cn>,
 regressions@...ts.linux.dev
Subject: Re: BUG and WARNINGs from mt7921s on next-20240916

On 17.09.24 08:17, Kalle Valo wrote:
> Lorenzo Bianconi <lorenzo@...nel.org> writes:
> 
>>> Hi,
>>> 
>>> I ran into some bug messages while testing linux-next on a MT8186
>>> Magneton Chromebook (mt8186-corsola-magneton-sku393218). It boots 
>>> to the OS, but at least Wi-Fi and Bluetooth are unavailable.
>>> 
>>> As a start, I tried reverting commit abbd838c579e ("Merge tag 
>>> 'mt76-for-kvalo-2024-09-06' of https://github.com/nbd168/wireless")
>>> and it works fine after that. Didn't have time to do a full bisect, 
>>> but will try if nobody has any immediate opinions.
>>> 
>>> There are a few traces, here's some select lines to catch your attention,
>>> not sure how informational they are:
>>> 
>>> [   16.040525] kernel BUG at net/core/skbuff.c:2268!
>>> [   16.040531] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
>>> [ 16.040803] CPU: 3 UID: 0 PID: 526 Comm: mt76-sdio-txrx Not tainted
>>> 6.11.0-next-20240916-deb-00002-g7b544e01c649 #1
>>> [   16.040897] Call trace:
>>> [   16.040899]  pskb_expand_head+0x2b0/0x3c0
>>> [   16.040905]  mt76s_tx_run_queue+0x274/0x410 [mt76_sdio]
>>> [   16.040909]  mt76s_txrx_worker+0xe4/0xac8 [mt76_sdio]
>>> [   16.040914]  mt7921s_txrx_worker+0x98/0x1e0 [mt7921s]
>>> [   16.040924]  __mt76_worker_fn+0x80/0x128 [mt76]
>>> [   16.040934]  kthread+0xe8/0xf8
>>> [   16.040940]  ret_from_fork+0x10/0x20
>>
>> Hi,
>>
>> I guess this issue has been introduced by the following commit:
>>
>> commit 3688c18b65aeb2a1f2fde108400afbab129a8cc1
>> Author: Felix Fietkau <nbd@....name>
>> Date:   Tue Aug 27 11:30:01 2024 +0200                  
>>
>>     wifi: mt76: mt7915: retry mcu messages                                            
>>                         
>>     In some cases MCU messages can get lost. Instead of failing completely,
>>     attempt to recover by re-sending them.
>>      
>>     Link: https://patch.msgid.link/20240827093011.18621-14-nbd@nbd.name
>>     Signed-off-by: Felix Fietkau <nbd@....name>
>>
>>
>> In particular, skb_get() in mt76_mcu_skb_send_and_get_msg() is bumping skb users
>> refcount (making the skb shared) and pskb_expand_head() (run by __skb_grow() in
>> mt76s_tx_run_queue()) does not like shared skbs.
>>
>> @Felix: any input on it?

Sorry about that. Please try this patch, it should probably resolve this issue:

---
--- a/drivers/net/wireless/mediatek/mt76/mcu.c
+++ b/drivers/net/wireless/mediatek/mt76/mcu.c
@@ -84,13 +84,15 @@ int mt76_mcu_skb_send_and_get_msg(struct mt76_dev *dev, struct sk_buff *skb,
  	mutex_lock(&dev->mcu.mutex);
  
  	if (dev->mcu_ops->mcu_skb_prepare_msg) {
+		orig_skb = skb;
  		ret = dev->mcu_ops->mcu_skb_prepare_msg(dev, skb, cmd, &seq);
  		if (ret < 0)
  			goto out;
  	}
  
  retry:
-	orig_skb = skb_get(skb);
+	if (orig_skb)
+		skb_get(orig_skb);
  	ret = dev->mcu_ops->mcu_skb_send_msg(dev, skb, cmd, &seq);
  	if (ret < 0)
  		goto out;
@@ -105,7 +107,7 @@ int mt76_mcu_skb_send_and_get_msg(struct mt76_dev *dev, struct sk_buff *skb,
  	do {
  		skb = mt76_mcu_get_response(dev, expires);
  		if (!skb && !test_bit(MT76_MCU_RESET, &dev->phy.state) &&
-		    retry++ < dev->mcu_ops->max_retry) {
+		    orig_skb && retry++ < dev->mcu_ops->max_retry) {
  			dev_err(dev->dev, "Retry message %08x (seq %d)\n",
  				cmd, seq);
  			skb = orig_skb;


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ