[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAD=FV=Vt+0xQ6wpyF2y1TqS8d04nq4x1b_TcLwy2aNO4dn9x4g@mail.gmail.com>
Date: Wed, 22 Jul 2020 15:08:14 -0700
From: Doug Anderson <dianders@...omium.org>
To: Stephen Boyd <swboyd@...omium.org>
Cc: Wolfram Sang <wsa@...-dreams.de>,
Sai Prakash Ranjan <saiprakash.ranjan@...eaurora.org>,
Rajendra Nayak <rnayak@...eaurora.org>,
Akash Asthana <akashast@...eaurora.org>,
Alok Chauhan <alokc@...eaurora.org>,
Andy Gross <agross@...nel.org>,
Bjorn Andersson <bjorn.andersson@...aro.org>,
Wolfram Sang <wsa@...nel.org>,
linux-arm-msm <linux-arm-msm@...r.kernel.org>,
linux-i2c@...r.kernel.org, LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] i2c: i2c-qcom-geni: Fix DMA transfer race
Hi,
On Tue, Jul 21, 2020 at 1:26 PM Doug Anderson <dianders@...omium.org> wrote:
>
> Hi,
>
> On Tue, Jul 21, 2020 at 11:55 AM Stephen Boyd <swboyd@...omium.org> wrote:
> >
> > Quoting Doug Anderson (2020-07-21 09:18:35)
> > > On Tue, Jul 21, 2020 at 12:08 AM Stephen Boyd <swboyd@...omium.org> wrote:
> > > >
> > > > Quoting Stephen Boyd (2020-07-20 22:59:14)
> > > > >
> > > > > I worry that we also need a dmb() here to make sure the dma buffer is
> > > > > properly mapped before this write to the device is attempted. But it may
> > > > > only matter to be before the I2C_READ.
> > > > >
> > > >
> > > > I'm suggesting this patch instead where we make geni_se_setup_m_cmd()
> > > > use a writel() so that it has the proper barrier semantics to wait for
> > > > the other memory writes that happened in program order before this point
> > > > to complete before the device is kicked to do a read or a write.
> > >
> > > Are you saying that dma_map_single() isn't guaranteed to have a
> > > barrier or something? I tried to do some searching and found a thread
> > > [1] where someone tried to add a barrierless variant of them. To me
> > > that means that the current APIs have barriers.
> > >
> > > ...or is there something else you're worried about?
> >
> > I'm not really thinking about dma_map_single() having a barrier or not.
> > The patch you mention is from 2010. Many things have changed in the last
> > decade. Does it have barrier semantics? The presence of a patch on the
> > mailing list doesn't mean much.
>
> Yes, it's pretty old, but if you follow the thread and look at the
> patch I'm fairly certain it's still relevant. Specifically, following
> one thread of dma_map_single() on arm64:
>
> dma_map_single()
> -> dma_map_single_attrs()
> --> dma_map_page_attrs()
> ---> dma_direct_map_page()
> ----> arch_sync_dma_for_device()
> -----> __dma_map_area()
> ------> __dma_inv_area() which has a "dsb"
>
> I'm sure there are lots of other possible paths, but one thing pointed
> out by following that path is 'DMA_ATTR_SKIP_CPU_SYNC'. The
> documentation of that option talks about the normal flow. It says
> that in the normal flow that dma_map_{single,page,sg} will
> synchronize. We are in the normal flow here.
>
> As far as I understand, the whole point of dma_map_single() is to take
> a given buffer and get it all ready so that if a device does DMA on it
> right after the function exits that it's all set.
>
>
> > Specifically I'm looking at "KERNEL I/O BARRIER EFFECTS" of
> > Documentation/memory-barriers.txt and noticing that this driver is using
> > relaxed IO accessors meaning that the reads and writes aren't ordered
> > with respect to other memory accesses. They're only ordered to
> > themselves within the same device. I'm concerned that the CPU will issue
> > the IO access to start the write DMA operation before the buffer is
> > copied over due to out of order execution.
>
> I'm not an expert either, but it really looks like dma_map_single()
> does all that we need it to.
>
>
> > I'm not an expert in this area, but this is why we ask driver authors to
> > use the non-relaxed accessors because they have the appropriate
> > semantics built in to make them easy to reason about. They do what they
> > say when they say to do it.
>
> I'm all for avoiding using the relaxed variants too except if it's
> been shown to be a performance problem. The one hesitation I have,
> though, is that I've spent time poking a bunch at the geni SPI driver.
> We do _a lot_ of very small SPI transfers on our system. For each of
> these it's gotta setup a lot of commands. When I was poking I
> definitely noticed the difference between writel() and
> writel_relaxed(). If we can save a few microseconds on each one of
> these transfers it's probably worth it since it's effectively in the
> inner loop of some transfers.
>
> One option I thought of was to track the mode (DMA vs. FIFO) and only
> do writel() for DMA mode. If you're not convinced by my arguments
> about dma_map_single(), would you be good with just doing the
> non-relaxed version if we're in DMA mode?
OK, so I did some quick benchmarking and I couldn't find any
performance regression with just always using writel() here. Even if
dma_map_single() does guarantee that things are synced:
* There's no guarantee that all geni users will use dma_map_{xxx}.
* As Stephen says, the writel() is easier to reason about.
The change to a writel() is a bit orthogonal to the issue being
discussed here, though and it wouldn't make sense to have one patch
touch both the geni headers and also the i2c code. Thus, I have sent
v2 without it (just with the other fixes that Stephen requested) and
also sent out a separate patch to change from writel_relaxed() to
writel().
Breadcrumbs:
[PATCH v2] i2c: i2c-qcom-geni: Fix DMA transfer race
https://lore.kernel.org/r/20200722145948.v2.1.I7efdf6efaa6edadbb690196cd4fbe3392a582c89@changeid/
[PATCH] soc: qcom-geni-se: Don't use relaxed writes when writing commands
https://lore.kernel.org/r/20200722150113.1.Ia50ab5cb8a6d3a73d302e6bdc25542d48ffd27f4@changeid/
As mentioned after the cut in the i2c change, I have kept people's
tested/reviewed tags for v2.
-Doug
Powered by blists - more mailing lists