[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b303ba1c-b521-44b4-9afc-cf1766f549ee@quicinc.com>
Date: Wed, 4 Jun 2025 10:16:23 +0800
From: Baochen Qiang <quic_bqiang@...cinc.com>
To: Johan Hovold <johan@...nel.org>
CC: Miaoqing Pan <quic_miaoqing@...cinc.com>,
Johan Hovold
<johan+linaro@...nel.org>,
Jeff Johnson <jjohnson@...nel.org>, <linux-wireless@...r.kernel.org>,
<ath11k@...ts.infradead.org>, <linux-kernel@...r.kernel.org>,
<stable@...r.kernel.org>
Subject: Re: [PATCH 1/3] wifi: ath11k: fix dest ring-buffer corruption
On 6/3/2025 7:51 PM, Johan Hovold wrote:
> On Tue, Jun 03, 2025 at 06:52:37PM +0800, Baochen Qiang wrote:
>> On 6/2/2025 4:03 PM, Johan Hovold wrote:
>
>>> No, the barrier is needed between reading the head pointer and accessing
>>> descriptor fields, that's what matters.
>>>
>>> You can still end up with reading stale descriptor data even when
>>> ath11k_hal_srng_dst_get_next_entry() returns non-NULL due to speculation
>>> (that's what happens on the X13s).
>>
>> The fact is that a dma_rmb() does not even prevent speculation, no matter where it is
>> placed, right?
>
> It prevents the speculated load from being used.
Sorry, still not get it. To my knowledge whether the speculated load (steps 3 and 4) would
get used depends on whether the condition check pass in step 2. How does a dma_rmb() make
any difference in this process?
Could you help give an detailed explanation on how dma_rmb() work here? I mean what issue
is there if without the barrier, and then how does the barrier prevent the issue? It would
be better if you can follow below pattern in the explanation, i.e, steps and code lines ...
>
>> If so the whole point of dma_rmb() is to prevent from compiler reordering
>> or CPU reordering, but is it really possible?
>>
>> The sequence is
>>
>> 1# reading HP
>> srng->u.dst_ring.cached_hp = READ_ONCE(*srng->u.dst_ring.hp_addr);
>>
>> 2# validate HP
>> if (srng->u.dst_ring.tp == srng->u.dst_ring.cached_hp)
>> return NULL;
>>
>> 3# get desc
>> desc = srng->ring_base_vaddr + srng->u.dst_ring.tp;
>>
>> 4# accessing desc
>> ath11k_hal_desc_reo_parse_err(... desc, ...)
>>
>> Clearly each step depends on the results of previous steps. In this case the compiler/CPU
>> is expected to be smart enough to not do any reordering, isn't it?
>
> Steps 3 and 4 can be done speculatively before the load in step 1 is
> complete as long as the result is discarded if it turns out not to be
> needed.
>
> Johan
Powered by blists - more mailing lists