[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20231226153754.6d5b2c5e@jic23-huawei>
Date: Tue, 26 Dec 2023 15:37:54 +0000
From: Jonathan Cameron <jic23@...nel.org>
To: Paul Cercueil <paul@...pouillou.net>
Cc: Lars-Peter Clausen <lars@...afoo.de>, Sumit Semwal
<sumit.semwal@...aro.org>, Christian König
<christian.koenig@....com>, Vinod Koul <vkoul@...nel.org>, Jonathan Corbet
<corbet@....net>, linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
dmaengine@...r.kernel.org, linux-iio@...r.kernel.org,
linux-media@...r.kernel.org, dri-devel@...ts.freedesktop.org,
linaro-mm-sig@...ts.linaro.org, Nuno Sá
<noname.nuno@...il.com>, Michael Hennerich <Michael.Hennerich@...log.com>
Subject: Re: [PATCH v5 0/8] iio: new DMABUF based API, v5
On Thu, 21 Dec 2023 18:56:52 +0100
Paul Cercueil <paul@...pouillou.net> wrote:
> Hi Jonathan,
>
> Le jeudi 21 décembre 2023 à 16:30 +0000, Jonathan Cameron a écrit :
> > On Tue, 19 Dec 2023 18:50:01 +0100
> > Paul Cercueil <paul@...pouillou.net> wrote:
> >
> > > [V4 was: "iio: Add buffer write() support"][1]
> > >
> > > Hi Jonathan,
> > >
> > Hi Paul,
> >
> > > This is a respin of the V3 of my patchset that introduced a new
> > > interface based on DMABUF objects [2].
> >
> > Great to see this moving forwards.
> >
> > >
> > > The V4 was a split of the patchset, to attempt to upstream buffer
> > > write() support first. But since there is no current user upstream,
> > > it
> > > was not merged. This V5 is about doing the opposite, and contains
> > > the
> > > new DMABUF interface, without adding the buffer write() support. It
> > > can
> > > already be used with the upstream adi-axi-adc driver.
> >
> > Seems like a sensible path in the short term.
> >
> > >
> > > In user-space, Libiio uses it to transfer back and forth blocks of
> > > samples between the hardware and the applications, without having
> > > to
> > > copy the data.
> > >
> > > On a ZCU102 with a FMComms3 daughter board, running Libiio from the
> > > pcercuei/dev-new-dmabuf-api branch [3], compiled with
> > > WITH_LOCAL_DMABUF_API=OFF (so that it uses fileio):
> > > sudo utils/iio_rwdev -b 4096 -B cf-ad9361-lpc
> > > Throughput: 116 MiB/s
> > >
> > > Same hardware, with the DMABUF API (WITH_LOCAL_DMABUF_API=ON):
> > > sudo utils/iio_rwdev -b 4096 -B cf-ad9361-lpc
> > > Throughput: 475 MiB/s
> > >
> > > This benchmark only measures the speed at which the data can be
> > > fetched
> > > to iio_rwdev's internal buffers, and does not actually try to read
> > > the
> > > data (e.g. to pipe it to stdout). It shows that fetching the data
> > > is
> > > more than 4x faster using the new interface.
> > >
> > > When actually reading the data, the performance difference isn't
> > > that
> > > impressive (maybe because in case of DMABUF the data is not in
> > > cache):
> >
> > This needs a bit more investigation ideally. Perhaps perf counters
> > can be
> > used to establish that cache misses are the main different between
> > dropping it on the floor and actually reading the data.
>
> Yes, we'll work on it. The other big difference is that fileio uses
> dma_alloc_coherent() while the DMABUFs use non-coherent mappings. I
> guess coherent memory is faster for the typical access pattern (which
> is "read/write everything sequentially once").
Long time since I last worked much with a platform that wasn't always
IO coherent, so I've forgotten how all this works (all ends up as no-ops
on platforms I tend to use these days!) Good luck, I'll be interested
to see what this turns out to be.
>
> > >
> > > WITH_LOCAL_DMABUF_API=OFF (so that it uses fileio):
> > > sudo utils/iio_rwdev -b 4096 cf-ad9361-lpc | dd of=/dev/zero
> > > status=progress
> > > 2446422528 bytes (2.4 GB, 2.3 GiB) copied, 22 s, 111 MB/s
> > >
> > > WITH_LOCAL_DMABUF_API=ON:
> > > sudo utils/iio_rwdev -b 4096 cf-ad9361-lpc | dd of=/dev/zero
> > > status=progress
> > > 2334388736 bytes (2.3 GB, 2.2 GiB) copied, 21 s, 114 MB/s
> > >
> > > One interesting thing to note is that fileio is (currently)
> > > actually
> > > faster than the DMABUF interface if you increase a lot the buffer
> > > size.
> > > My explanation is that the cache invalidation routine takes more
> > > and
> > > more time the bigger the DMABUF gets. This is because the DMABUF is
> > > backed by small-size pages, so a (e.g.) 64 MiB DMABUF is backed by
> > > up
> > > to 16 thousands pages, that have to be invalidated one by one. This
> > > can
> > > be addressed by using huge pages, but the udmabuf driver does not
> > > (yet)
> > > support creating DMABUFs backed by huge pages.
> >
> > I'd imagine folios of reasonable size will help sort of a huge page
> > as then hopefully it will use the flush by va range instructions if
> > available.
> >
> > >
> > > Anyway, the real benefits happen when the DMABUFs are either shared
> > > between IIO devices, or between the IIO subsystem and another
> > > filesystem. In that case, the DMABUFs are simply passed around
> > > drivers,
> > > without the data being copied at any moment.
> > >
> > > We use that feature to transfer samples from our transceivers to
> > > USB,
> > > using a DMABUF interface to FunctionFS [4].
> > >
> > > This drastically increases the throughput, to about 274 MiB/s over
> > > a
> > > USB3 link, vs. 127 MiB/s using IIO's fileio interface + write() to
> > > the
> > > FunctionFS endpoints, for a lower CPU usage (0.85 vs. 0.65 load
> > > avg.).
> >
> > This is a nice example. Where are you with getting the patch merged?
>
> I'll send a new version (mostly a [RESEND]...) in the coming days. As
> you can see from the review on my last attempt, the main blocker is
> that nobody wants to merge a new interface if the rest of the kernel
> bits aren't upstream yet. Kind of a chicken-and-egg problem :)
>
> > Overall, this code looks fine to me, though there are some parts that
> > need review by other maintainers (e.g. Vinod for the dmaengine
> > callback)
> > and I'd like a 'looks fine' at least form those who know a lot more
> > about dmabuf than I do.
> >
> > To actually make this useful sounds like either udmabuf needs some
> > perf
> > improvements, or there has to be an upstream case of sharing it
> > without
> > something else (e.g your functionfs patches). So what do we need to
> > get in before the positive benefit becomes worth carrying this extra
> > complexity? (which isn't too bad so I'm fine with a small benefit and
> > promises of riches :)
>
> I think the FunctionFS DMABUF interface can be pushed as well for 5.9,
> in parallel of this one, as the feedback on the V1 was good. I might
> just need some help pushing it forward (kind of a "I merge it if you
> merge it" guarantee).
Ok. If we get a 'fine by us' from DMABUF folk I'd be happy to make
that commitment for the IIO parts.
Jonathan
>
> Cheers,
> -Paul
>
> >
> > Jonathan
> >
> > >
> > > Based on linux-next/next-20231219.
> > >
> > > Cheers,
> > > -Paul
> > >
> > > [1]
> > > https://lore.kernel.org/all/20230807112113.47157-1-paul@crapouillou.net/
> > > [2]
> > > https://lore.kernel.org/all/20230403154800.215924-1-paul@crapouillou.net/
> > > [3]
> > > https://github.com/analogdevicesinc/libiio/tree/pcercuei/dev-new-dmabuf-api
> > > [4]
> > > https://lore.kernel.org/all/20230322092118.9213-1-paul@crapouillou.net/
> > >
> > > ---
> > > Changelog:
> > > - [3/8]: Replace V3's dmaengine_prep_slave_dma_array() with a new
> > > dmaengine_prep_slave_dma_vec(), which uses a new 'dma_vec'
> > > struct.
> > > Note that at some point we will need to support cyclic transfers
> > > using dmaengine_prep_slave_dma_vec(). Maybe with a new "flags"
> > > parameter to the function?
> > >
> > > - [4/8]: Implement .device_prep_slave_dma_vec() instead of V3's
> > > .device_prep_slave_dma_array().
> > >
> > > @Vinod: this patch will cause a small conflict with my other
> > > patchset adding scatter-gather support to the axi-dmac driver.
> > > This patch adds a call to axi_dmac_alloc_desc(num_sgs), but the
> > > prototype of this function changed in my other patchset - it
> > > would
> > > have to be passed the "chan" variable. I don't know how you
> > > prefer it
> > > to be resolved. Worst case scenario (and if @Jonathan is okay
> > > with
> > > that) this one patch can be re-sent later, but it would make this
> > > patchset less "atomic".
> > >
> > > - [5/8]:
> > > - Use dev_err() instead of pr_err()
> > > - Inline to_iio_dma_fence()
> > > - Add comment to explain why we unref twice when detaching dmabuf
> > > - Remove TODO comment. It is actually safe to free the file's
> > > private data even when transfers are still pending because it
> > > won't be accessed.
> > > - Fix documentation of new fields in struct
> > > iio_buffer_access_funcs
> > > - iio_dma_resv_lock() does not need to be exported, make it
> > > static
> > >
> > > - [7/8]:
> > > - Use the new dmaengine_prep_slave_dma_vec().
> > > - Restrict to input buffers, since output buffers are not yet
> > > supported by IIO buffers.
> > >
> > > - [8/8]:
> > > Use description lists for the documentation of the three new
> > > IOCTLs
> > > instead of abusing subsections.
> > >
> > > ---
> > > Alexandru Ardelean (1):
> > > iio: buffer-dma: split iio_dma_buffer_fileio_free() function
> > >
> > > Paul Cercueil (7):
> > > iio: buffer-dma: Get rid of outgoing queue
> > > dmaengine: Add API function dmaengine_prep_slave_dma_vec()
> > > dmaengine: dma-axi-dmac: Implement device_prep_slave_dma_vec
> > > iio: core: Add new DMABUF interface infrastructure
> > > iio: buffer-dma: Enable support for DMABUFs
> > > iio: buffer-dmaengine: Support new DMABUF based userspace API
> > > Documentation: iio: Document high-speed DMABUF based API
> > >
> > > Documentation/iio/dmabuf_api.rst | 54 +++
> > > Documentation/iio/index.rst | 2 +
> > > drivers/dma/dma-axi-dmac.c | 40 ++
> > > drivers/iio/buffer/industrialio-buffer-dma.c | 242 ++++++++---
> > > .../buffer/industrialio-buffer-dmaengine.c | 52 ++-
> > > drivers/iio/industrialio-buffer.c | 402
> > > ++++++++++++++++++
> > > include/linux/dmaengine.h | 25 ++
> > > include/linux/iio/buffer-dma.h | 33 +-
> > > include/linux/iio/buffer_impl.h | 26 ++
> > > include/uapi/linux/iio/buffer.h | 22 +
> > > 10 files changed, 836 insertions(+), 62 deletions(-)
> > > create mode 100644 Documentation/iio/dmabuf_api.rst
> > >
> >
>
>
Powered by blists - more mailing lists