[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKXuJqh0_7fduDgDXWzCE2fYNHV-mDa29Lxq15h7-vam2Nin6w@mail.gmail.com>
Date: Sat, 22 Mar 2025 15:32:08 -0500
From: Steev Klimaszewski <steev@...i.org>
To: Johan Hovold <johan+linaro@...nel.org>
Cc: Jeff Johnson <jjohnson@...nel.org>, Miaoqing Pan <quic_miaoqing@...cinc.com>,
Clayton Craft <clayton@...ftyguy.net>, Jens Glathe <jens.glathe@...schoolsolutions.biz>,
ath11k@...ts.infradead.org, linux-kernel@...r.kernel.org,
stable@...r.kernel.org
Subject: Re: [PATCH] wifi: ath11k: fix rx completion meta data corruption
Hi Johan,
On Fri, Mar 21, 2025 at 9:55 AM Johan Hovold <johan+linaro@...nel.org> wrote:
>
> Add the missing memory barrier to make sure that the REO dest ring
> descriptor is read after the head pointer to avoid using stale data on
> weakly ordered architectures like aarch64.
>
> This may fix the ring-buffer corruption worked around by commit
> f9fff67d2d7c ("wifi: ath11k: Fix SKB corruption in REO destination
> ring") by silently discarding data, and may possibly also address user
> reported errors like:
>
> ath11k_pci 0006:01:00.0: msdu_done bit in attention is not set
>
> Tested-on: WCN6855 hw2.1 WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.41
>
> Fixes: d5c65159f289 ("ath11k: driver for Qualcomm IEEE 802.11ax devices")
> Cc: stable@...r.kernel.org # 5.6
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=218005
> Signed-off-by: Johan Hovold <johan+linaro@...nel.org>
> ---
>
> As I reported here:
>
> https://lore.kernel.org/lkml/Z9G5zEOcTdGKm7Ei@hovoldconsulting.com/
>
> the ath11k and ath12k appear to be missing a number of memory barriers
> that are required on weakly ordered architectures like aarch64 to avoid
> memory corruption issues.
>
> Here's a fix for one more such case which people already seem to be
> hitting.
>
> Note that I've seen one "msdu_done" bit not set warning also with this
> patch so whether it helps with that at all remains to be seen. I'm CCing
> Jens and Steev that see these warnings frequently and that may be able
> to help out with testing.
>
> Johan
>
>
> drivers/net/wireless/ath/ath11k/dp_rx.c | 25 ++++++++++++++++---------
> 1 file changed, 16 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/net/wireless/ath/ath11k/dp_rx.c b/drivers/net/wireless/ath/ath11k/dp_rx.c
> index 029ecf51c9ef..0a57b337e4c6 100644
> --- a/drivers/net/wireless/ath/ath11k/dp_rx.c
> +++ b/drivers/net/wireless/ath/ath11k/dp_rx.c
> @@ -2646,7 +2646,7 @@ int ath11k_dp_process_rx(struct ath11k_base *ab, int ring_id,
> struct ath11k *ar;
> struct hal_reo_dest_ring *desc;
> enum hal_reo_dest_ring_push_reason push_reason;
> - u32 cookie;
> + u32 cookie, info0, rx_msdu_info0, rx_mpdu_info0;
> int i;
>
> for (i = 0; i < MAX_RADIOS; i++)
> @@ -2659,11 +2659,14 @@ int ath11k_dp_process_rx(struct ath11k_base *ab, int ring_id,
> try_again:
> ath11k_hal_srng_access_begin(ab, srng);
>
> + /* Make sure descriptor is read after the head pointer. */
> + dma_rmb();
> +
> while (likely(desc =
> (struct hal_reo_dest_ring *)ath11k_hal_srng_dst_get_next_entry(ab,
> srng))) {
> cookie = FIELD_GET(BUFFER_ADDR_INFO1_SW_COOKIE,
> - desc->buf_addr_info.info1);
> + READ_ONCE(desc->buf_addr_info.info1));
> buf_id = FIELD_GET(DP_RXDMA_BUF_COOKIE_BUF_ID,
> cookie);
> mac_id = FIELD_GET(DP_RXDMA_BUF_COOKIE_PDEV_ID, cookie);
> @@ -2692,8 +2695,9 @@ int ath11k_dp_process_rx(struct ath11k_base *ab, int ring_id,
>
> num_buffs_reaped[mac_id]++;
>
> + info0 = READ_ONCE(desc->info0);
> push_reason = FIELD_GET(HAL_REO_DEST_RING_INFO0_PUSH_REASON,
> - desc->info0);
> + info0);
> if (unlikely(push_reason !=
> HAL_REO_DEST_RING_PUSH_REASON_ROUTING_INSTRUCTION)) {
> dev_kfree_skb_any(msdu);
> @@ -2701,18 +2705,21 @@ int ath11k_dp_process_rx(struct ath11k_base *ab, int ring_id,
> continue;
> }
>
> - rxcb->is_first_msdu = !!(desc->rx_msdu_info.info0 &
> + rx_msdu_info0 = READ_ONCE(desc->rx_msdu_info.info0);
> + rx_mpdu_info0 = READ_ONCE(desc->rx_mpdu_info.info0);
> +
> + rxcb->is_first_msdu = !!(rx_msdu_info0 &
> RX_MSDU_DESC_INFO0_FIRST_MSDU_IN_MPDU);
> - rxcb->is_last_msdu = !!(desc->rx_msdu_info.info0 &
> + rxcb->is_last_msdu = !!(rx_msdu_info0 &
> RX_MSDU_DESC_INFO0_LAST_MSDU_IN_MPDU);
> - rxcb->is_continuation = !!(desc->rx_msdu_info.info0 &
> + rxcb->is_continuation = !!(rx_msdu_info0 &
> RX_MSDU_DESC_INFO0_MSDU_CONTINUATION);
> rxcb->peer_id = FIELD_GET(RX_MPDU_DESC_META_DATA_PEER_ID,
> - desc->rx_mpdu_info.meta_data);
> + READ_ONCE(desc->rx_mpdu_info.meta_data));
> rxcb->seq_no = FIELD_GET(RX_MPDU_DESC_INFO0_SEQ_NUM,
> - desc->rx_mpdu_info.info0);
> + rx_mpdu_info0);
> rxcb->tid = FIELD_GET(HAL_REO_DEST_RING_INFO0_RX_QUEUE_NUM,
> - desc->info0);
> + info0);
>
> rxcb->mac_id = mac_id;
> __skb_queue_tail(&msdu_list[mac_id], msdu);
> --
> 2.48.1
>
While the fix is definitely a fix, it does not seem to help with the
`msdu_done bit in attention is not set` message as I have seen it 43
times in the last 12 hours.
-- Steev
Powered by blists - more mailing lists