lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 6 May 2022 11:01:45 -0700
From:   Hemant Kumar <quic_hemantk@...cinc.com>
To:     Manivannan Sadhasivam <manivannan.sadhasivam@...aro.org>,
        Loic Poulain <loic.poulain@...aro.org>
CC:     <mhi@...ts.linux.dev>, <linux-arm-msm@...r.kernel.org>,
        <linux-kernel@...r.kernel.org>, <quic_bbhatt@...cinc.com>
Subject: Re: [PATCH 5/5] bus: mhi: host: Remove redundant dma_wmb() before ctx
 wp update

Hi Loic,

On 5/6/2022 10:41 AM, Hemant Kumar wrote:
> Hi Loic,
> 
> On 5/4/2022 8:58 AM, Manivannan Sadhasivam wrote:
>> On Wed, May 04, 2022 at 11:25:33AM +0200, Loic Poulain wrote:
>>> On Wed, 4 May 2022 at 10:17, Manivannan Sadhasivam
>>> <manivannan.sadhasivam@...aro.org>  wrote:
>>>> Hi Loic,
>>>>
>>>> On Wed, May 04, 2022 at 09:21:20AM +0200, Loic Poulain wrote:
>>>>> Hi Mani,
>>>>>
>>>>> On Mon, 2 May 2022 at 12:42, Manivannan Sadhasivam
>>>>> <manivannan.sadhasivam@...aro.org>  wrote:
>>>>>> The endpoint device will only read the context wp when the host rings
>>>>>> the doorbell.
>>>>> Are we sure about this statement? what if we update ctxt_wp while the
>>>>> device is still processing the previous ring? is it going to continue
>>>>> processing the new ctxt_wp or wait for a new doorbell interrupt? what
>>>>> about burst mode in which we don't ring at all (ring_db is no-op)?
>>>>>
>>>> Good point. I think my statement was misleading. But still this scenario won't
>>>> happen as per my undestanding. Please see below.
>>>>
>>>>>> And moreover the doorbell write is using writel(). This
>>>>>> guarantess that the prior writes will be completed before ringing
>>>>>> doorbell.
>>>>> Yes but the barrier is to ensure that descriptor/ring content is
>>>>> updated before we actually pass it to device ownership, it's not about
>>>>> ordering with the doorbell write, but the memory coherent ones.
>>>>>
>>>> I see a clear data dependency between writing the ring element and updating the
>>>> context pointer. For instance,
>>>>
>>>> ```
>>>> struct mhi_ring_element *mhi_tre;
>>>>
>>>> mhi_tre = ring->wp;
>>>> /* Populate mhi_tre */
>>>> ...
>>>>
>>>> /* Increment wp */
>>>> ring->wp += el_size;
>>>>
>>>> /* Update ctx wp */
>>>> ring->ctx_wp = ring->iommu_base + (ring->wp - ring->base);
>>>> ```
>>>>
>>>> This is analogous to:
>>>>
>>>> ```
>>>> Read PTR A;
>>>> Update PTR A;
>>>> Increment PTR A;
>>>> Write PTR A to PTR B;
>>>> ```
>>> Interesting point, but shouldn't it be more correct to translate it as:
>>>
>>> 1. Write PTR A to PTR B (mhi_tre);
>>> 2. Update PTR B DATA;
>>> 3. Increment PTR A;
>>> 4. Write PTR A to PTR C;
>>>
>>> In that case, it looks like line 2. has no ordering constraint with 3.
>>> & 4? whereas the following guarantee it:
>>>
>>> 1. Write PTR A to PTR B (mhi_tre);
>>> 2. Update PTR B DATA;
>>> 3. Increment PTR A;
>>> dma_wmb()
>>> 4. Write PTR A to PTR C;
>>>
>>> To be honest, compiler optimization is beyond my knowledge, so I don't
>>> know if a specific compiler arch/version could be able to mess up the
>>> sequence or not. But this pattern is really close to what is described
>>> for dma_wmb() usage in Documentation/memory-barriers.txt. That's why I
>>> challenged this change and would be conservative, keeping the explicit
>>> barrier.
>>>
>> Hmm. Since I was reading the memory model and going through the MHI code, I
>> _thought_ that this dma_wmb() is redundant. But I missed the fact that the
>> updating to memory pointed by "wp" happens implicitly via a pointer. So that
>> won't qualify as a direct dependency.
>>
>>>> Here, because of the data dependency due to "ring->wp", the CPU or compiler
>>>> won't be ordering the instructions. I think that's one of the reason we never
>>>> hit any issue due to this.
>>> You may be right here about the implicit ordering guarantee... So if
>>> you're sure, I think it would deserve an inline comment to explain why
>>> we don't need a memory barrier as in the 'usual' dma descriptor update
>>> sequences.
>>>
>> I think the barrier makes sense now. Sorry for the confusion and thanks for the
>> explanations.
>>
>> Thanks,
>> Mani
>>
>>> Loic
> 
> You made a good point. After following your conversation, in case of 
> burst mode is enabled and currently
> 
> we are in polling mode, does it make sense to move dma_wmb after 
> updating channel WP context ?
> 
> DB ring is going to get skipped when we are in pilling mode.
> 
> instead of dma_wmb();
> *ring->ctxt_wp  =  cpu_to_le64(db);
> 
> *ring->ctxt_wp  =  cpu_to_le64(db); dma_wmb();
> 
> Thanks,
> Hemant
> 
i think i spoke too fast. I think we dont need to worry about the 
polling mode as the context_wp update would happen at some point of time 
and that does not require dma_wmb after update context wp.

Thanks,
Hemant

Powered by blists - more mailing lists