[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bgnkzjpegpmif7gohoijt42rku6jruovjsxscg75dprdz5ek2i@ntfh2yyjyiry>
Date: Fri, 28 Feb 2025 11:18:28 +0100
From: Jacopo Mondi <jacopo.mondi@...asonboard.com>
To: Tomasz Figa <tfiga@...omium.org>
Cc: Jacopo Mondi <jacopo.mondi@...asonboard.com>,
Mikhail Rudenko <mike.rudenko@...il.com>, Laurent Pinchart <laurent.pinchart@...asonboard.com>,
Dafna Hirschfeld <dafna@...tmail.com>, Mauro Carvalho Chehab <mchehab@...nel.org>,
Heiko Stuebner <heiko@...ech.de>, linux-media@...r.kernel.org, linux-rockchip@...ts.infradead.org,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] media: rkisp1: allow non-coherent video capture buffers
Hi Tomasz
On Fri, Feb 28, 2025 at 07:00:57PM +0900, Tomasz Figa wrote:
> Hi Jacopo,
>
> On Fri, Feb 28, 2025 at 2:11 AM Jacopo Mondi
> <jacopo.mondi@...asonboard.com> wrote:
> >
> > Hi Mikhail
> >
> > On Tue, Jan 14, 2025 at 07:00:39PM +0300, Mikhail Rudenko wrote:
> > >
> > > Hi Laurent,
> > >
> > > On 2025-01-03 at 17:23 +02, Laurent Pinchart <laurent.pinchart@...asonboard.com> wrote:
> > >
> > > > On Thu, Jan 02, 2025 at 06:35:00PM +0300, Mikhail Rudenko wrote:
> > > >> Currently, the rkisp1 driver always uses coherent DMA allocations for
> > > >> video capture buffers. However, on some platforms, using non-coherent
> > > >> buffers can improve performance, especially when CPU processing of
> > > >> MMAP'ed video buffers is required.
> > > >>
> > > >> For example, on the Rockchip RK3399 running at maximum CPU frequency,
> > > >> the time to memcpy a frame from a 1280x720 XRGB32 MMAP'ed buffer to a
> > > >> malloc'ed userspace buffer decreases from 7.7 ms to 1.1 ms when using
> > > >> non-coherent DMA allocation. CPU usage also decreases accordingly.
> > > >
> > > > What's the time taken by the cache management operations ?
> > >
> > > Sorry for the late reply, your question turned out a little more
> > > interesting than I expected initially. :)
> > >
> > > When capturing using Yavta with MMAP buffers under the conditions mentioned
> > > in the commit message, ftrace gives 437.6 +- 1.1 us for
> > > dma_sync_sgtable_for_cpu and 409 +- 14 us for
> > > dma_sync_sgtable_for_device. Thus, it looks like using non-coherent
> > > buffers in this case is more CPU-efficient even when considering cache
> > > management overhead.
> > >
> > > When trying to do the same measurements with libcamera, I failed. In a
> > > typical libcamera use case when MMAP buffers are allocated from a
> > > device, exported as dmabufs and then used for capture on the same device
> > > with DMABUF memory type, cache management in kernel is skipped [1]
> > > [2]. Also, vb2_dc_dmabuf_ops_{begin,end}_cpu_access are no-ops [3], so
> > > DMA_BUF_IOCTL_SYNC from userspace does not work either.
> > >
> > > So it looks like to make this change really useful, the above issue of
> > > cache management for libcamera/DMABUF/videobuf2-dma-contig has to be
> > > solved. I'm not an expert in this area, so any advice is kindly welcome. :)
> >
> > It would be shame if we let this discussion drop dead.. cache
> > management policies are relevant for performances, specifically for
> > cpu access, and your above 7.7ms vs 1.1 ms test clearly shows that.
> >
> > >
> > > [1] https://git.linuxtv.org/media.git/tree/drivers/media/common/videobuf2/videobuf2-core.c?id=94794b5ce4d90ab134b0b101a02fddf6e74c437d#n411
> >
> > I would like to know from Hans if the decision to disallow cache-hints
> > for dmabuf importers is a design choice or is deeply rooted in other
> > reasons I might be missing.
>
> When DMA-buf is used, the responsibility for cache management is
> solely on the CPU users' side, so cache-hints don't really apply. It's
> the exporter (=allocator) who determines the mapping policy of the
> buffer and provides necessary DMA_BUF_SYNC operations (can be no-op if
> the buffer is coherent).
This all makes sense.
I take it as, for libcamera, users of the FrameBufferAllocator helper
(which first exports MMAP buffers from the video device and the
imports them back as DMABUF) won't be able to control the cache
policies.
Now, in the long term, we want FrameBufferAllocator to go away and
have either buffers exported by the consumer (likely DRM) or by a
system wide buffer allocator (when we'll have one) and have the video
devices operate as pure importers. But for the time being the
"first export then import" use case is possibile and valid so I wonder
if we should consider measures to allow controlling caching policies
for this use case too.
>
> Best regards,
> Tomasz
>
> >
> > I'm asking because the idea is for libcamera to act solely as dma-buf
> > importer, the current alloc-export-then-import trick is an helper for
> > applications to work around the absence of a system allocator.
> >
> > If the requirement to disable cache-hints for importers cannot be
> > lifted, for libcamera it means we would not be able to use it.
> >
> >
> > > [2] https://git.linuxtv.org/media.git/tree/drivers/media/common/videobuf2/videobuf2-core.c?id=94794b5ce4d90ab134b0b101a02fddf6e74c437d#n829
> > > [3] https://git.linuxtv.org/media.git/tree/drivers/media/common/videobuf2/videobuf2-dma-contig.c?id=94794b5ce4d90ab134b0b101a02fddf6e74c437d#n426
> > >
> > > --
> > > Best regards,
> > > Mikhail Rudenko
> > >
> >
Powered by blists - more mailing lists