lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 4 May 2022 11:25:33 +0200
From:   Loic Poulain <loic.poulain@...aro.org>
To:     Manivannan Sadhasivam <manivannan.sadhasivam@...aro.org>
Cc:     mhi@...ts.linux.dev, linux-arm-msm@...r.kernel.org,
        linux-kernel@...r.kernel.org, quic_hemantk@...cinc.com,
        quic_bbhatt@...cinc.com
Subject: Re: [PATCH 5/5] bus: mhi: host: Remove redundant dma_wmb() before ctx
 wp update

On Wed, 4 May 2022 at 10:17, Manivannan Sadhasivam
<manivannan.sadhasivam@...aro.org> wrote:
>
> Hi Loic,
>
> On Wed, May 04, 2022 at 09:21:20AM +0200, Loic Poulain wrote:
> > Hi Mani,
> >
> > On Mon, 2 May 2022 at 12:42, Manivannan Sadhasivam
> > <manivannan.sadhasivam@...aro.org> wrote:
> > >
> > > The endpoint device will only read the context wp when the host rings
> > > the doorbell.
> >
> > Are we sure about this statement? what if we update ctxt_wp while the
> > device is still processing the previous ring? is it going to continue
> > processing the new ctxt_wp or wait for a new doorbell interrupt? what
> > about burst mode in which we don't ring at all (ring_db is no-op)?
> >
>
> Good point. I think my statement was misleading. But still this scenario won't
> happen as per my undestanding. Please see below.
>
> > > And moreover the doorbell write is using writel(). This
> > > guarantess that the prior writes will be completed before ringing
> > > doorbell.
> >
> > Yes but the barrier is to ensure that descriptor/ring content is
> > updated before we actually pass it to device ownership, it's not about
> > ordering with the doorbell write, but the memory coherent ones.
> >
>
> I see a clear data dependency between writing the ring element and updating the
> context pointer. For instance,
>
> ```
> struct mhi_ring_element *mhi_tre;
>
> mhi_tre = ring->wp;
> /* Populate mhi_tre */
> ...
>
> /* Increment wp */
> ring->wp += el_size;
>
> /* Update ctx wp */
> ring->ctx_wp = ring->iommu_base + (ring->wp - ring->base);
> ```
>
> This is analogous to:
>
> ```
> Read PTR A;
> Update PTR A;
> Increment PTR A;
> Write PTR A to PTR B;
> ```

Interesting point, but shouldn't it be more correct to translate it as:

1. Write PTR A to PTR B (mhi_tre);
2. Update PTR B DATA;
3. Increment PTR A;
4. Write PTR A to PTR C;

In that case, it looks like line 2. has no ordering constraint with 3.
& 4? whereas the following guarantee it:

1. Write PTR A to PTR B (mhi_tre);
2. Update PTR B DATA;
3. Increment PTR A;
dma_wmb()
4. Write PTR A to PTR C;

To be honest, compiler optimization is beyond my knowledge, so I don't
know if a specific compiler arch/version could be able to mess up the
sequence or not. But this pattern is really close to what is described
for dma_wmb() usage in Documentation/memory-barriers.txt. That's why I
challenged this change and would be conservative, keeping the explicit
barrier.

>
> Here, because of the data dependency due to "ring->wp", the CPU or compiler
> won't be ordering the instructions. I think that's one of the reason we never
> hit any issue due to this.

You may be right here about the implicit ordering guarantee... So if
you're sure, I think it would deserve an inline comment to explain why
we don't need a memory barrier as in the 'usual' dma descriptor update
sequences.

Loic

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ