lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <c34f7d6c-0565-4ff6-aed6-902657938b39@quicinc.com>
Date: Fri, 14 Mar 2025 21:56:03 +0800
From: Miaoqing Pan <quic_miaoqing@...cinc.com>
To: Johan Hovold <johan@...nel.org>
CC: <quic_jjohnson@...cinc.com>, <ath11k@...ts.infradead.org>,
        <linux-wireless@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
        <johan+linaro@...nel.org>
Subject: Re: [PATCH v3 ath-next 2/2] wifi: ath11k: fix HTC rx insufficient
 length



On 3/14/2025 4:20 PM, Johan Hovold wrote:
> On Fri, Mar 14, 2025 at 02:13:53PM +0800, Miaoqing Pan wrote:
>> A relatively unusual race condition occurs between host software
>> and hardware, where the host sees the updated destination ring head
>> pointer before the hardware updates the corresponding descriptor.
>> When this situation occurs, the length of the descriptor returns 0.
>>
>> The current error handling method is to increment descriptor tail
>> pointer by 1, but 'sw_index' is not updated, causing descriptor and
>> skb to not correspond one-to-one, resulting in the following error:
>>
>> ath11k_pci 0006:01:00.0: HTC Rx: insufficient length, got 1488, expected 1492
>> ath11k_pci 0006:01:00.0: HTC Rx: insufficient length, got 1460, expected 1484
>>
>> To address this problem and work around the broken hardware,
>> temporarily skip processing the current descriptor and handle it
>> again next time. However, to prevent this descriptor from
>> continuously returning 0, use the skb control block (cb) to set
>> a flag. If the length returns 0 again, this descriptor will be
>> discarded.
>>
>> Tested-on: QCA6698AQ hw2.1 PCI WLAN.HSP.1.1-04546-QCAHSPSWPL_V1_V2_SILICONZ_IOE-1
>>
>> Reported-by: Johan Hovold <johan+linaro@...nel.org>
>> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218623
>> Signed-off-by: Miaoqing Pan <quic_miaoqing@...cinc.com>
> 
>> @@ -387,18 +387,36 @@ static int ath11k_ce_completed_recv_next(struct ath11k_ce_pipe *pipe,
>>   
>>   	ath11k_hal_srng_access_begin(ab, srng);
>>   
>> -	desc = ath11k_hal_srng_dst_get_next_entry(ab, srng);
>> +	desc = ath11k_hal_srng_dst_peek(ab, srng);
>>   	if (!desc) {
>>   		ret = -EIO;
>>   		goto err;
>>   	}
>>   
>>   	*nbytes = ath11k_hal_ce_dst_status_get_length(desc);
> 
> As I mentioned elsewhere, this function also sets the length field in
> the descriptor to zero. So if there's a racing update, you may never see
> the updated length.
> 

Will add below check.

--- a/drivers/net/wireless/ath/ath11k/hal.c
+++ b/drivers/net/wireless/ath/ath11k/hal.c
@@ -602,7 +602,11 @@ u32 ath11k_hal_ce_dst_status_get_length(void *buf)
         u32 len;

         len = FIELD_GET(HAL_CE_DST_STATUS_DESC_FLAGS_LEN, desc->flags);
-       desc->flags &= ~HAL_CE_DST_STATUS_DESC_FLAGS_LEN;
+       /* Avoid setting the length field in the descriptor to zero when 
length
+        * is 0, as there's a racing update, may never see the updated 
length.
+        */
+       if (likely(len))
+               desc->flags &= ~HAL_CE_DST_STATUS_DESC_FLAGS_LEN;



>> -	if (*nbytes == 0) {
>> -		ret = -EIO;
>> -		goto err;
>> +	if (unlikely(*nbytes == 0)) {
>> +		struct ath11k_skb_rxcb *rxcb =
>> +			ATH11K_SKB_RXCB(pipe->dest_ring->skb[sw_index]);
>> +
>> +		/* A relatively unusual race condition occurs between host
>> +		 * software and hardware, where the host sees the updated
>> +		 * destination ring head pointer before the hardware updates
>> +		 * the corresponding descriptor.
>> +		 *
>> +		 * Temporarily skip processing the current descriptor and handle
>> +		 * it again next time. However, to prevent this descriptor from
>> +		 * continuously returning 0, set 'is_desc_len0' flag. If the
>> +		 * length returns 0 again, this descriptor will be discarded.
>> +		 */
>> +		if (!rxcb->is_desc_len0) {
>> +			rxcb->is_desc_len0 = true;
>> +			ret = -EIO;
>> +			goto err;
>> +		}
> 
> If you add the memory barrier and make sure not to clear the length
> field above, do you still see the length sometimes always reading zero
> if you retry more than once (i.e. drop the is_desc_len0 flag)?
> 
> Perhaps the device is really passing you a zero-length descriptor that
> can simply be discarded straight away?
> 
> Johan

Will verify your suggestion, thanks.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ