[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <4eaae50515b46545de4ff399bc2365a3c4f9c44f.camel@ndufresne.ca>
Date: Wed, 15 Jan 2025 14:13:27 -0500
From: Nicolas Dufresne <nicolas@...fresne.ca>
To: Mikhail Rudenko <mike.rudenko@...il.com>, Laurent Pinchart
<laurent.pinchart@...asonboard.com>
Cc: Dafna Hirschfeld <dafna@...tmail.com>, Mauro Carvalho Chehab
<mchehab@...nel.org>, Heiko Stuebner <heiko@...ech.de>,
linux-media@...r.kernel.org, linux-rockchip@...ts.infradead.org,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] media: rkisp1: allow non-coherent video capture buffers
Le mardi 14 janvier 2025 à 19:00 +0300, Mikhail Rudenko a écrit :
> Hi Laurent,
>
> On 2025-01-03 at 17:23 +02, Laurent Pinchart <laurent.pinchart@...asonboard.com> wrote:
>
> > On Thu, Jan 02, 2025 at 06:35:00PM +0300, Mikhail Rudenko wrote:
> > > Currently, the rkisp1 driver always uses coherent DMA allocations for
> > > video capture buffers. However, on some platforms, using non-coherent
> > > buffers can improve performance, especially when CPU processing of
> > > MMAP'ed video buffers is required.
> > >
> > > For example, on the Rockchip RK3399 running at maximum CPU frequency,
> > > the time to memcpy a frame from a 1280x720 XRGB32 MMAP'ed buffer to a
> > > malloc'ed userspace buffer decreases from 7.7 ms to 1.1 ms when using
> > > non-coherent DMA allocation. CPU usage also decreases accordingly.
> >
> > What's the time taken by the cache management operations ?
>
> Sorry for the late reply, your question turned out a little more
> interesting than I expected initially. :)
>
> When capturing using Yavta with MMAP buffers under the conditions mentioned
> in the commit message, ftrace gives 437.6 +- 1.1 us for
> dma_sync_sgtable_for_cpu and 409 +- 14 us for
> dma_sync_sgtable_for_device. Thus, it looks like using non-coherent
> buffers in this case is more CPU-efficient even when considering cache
> management overhead.
>
> When trying to do the same measurements with libcamera, I failed. In a
> typical libcamera use case when MMAP buffers are allocated from a
> device, exported as dmabufs and then used for capture on the same device
> with DMABUF memory type, cache management in kernel is skipped [1]
> [2]. Also, vb2_dc_dmabuf_ops_{begin,end}_cpu_access are no-ops [3], so
> DMA_BUF_IOCTL_SYNC from userspace does not work either.
>
> So it looks like to make this change really useful, the above issue of
> cache management for libcamera/DMABUF/videobuf2-dma-contig has to be
> solved. I'm not an expert in this area, so any advice is kindly welcome. :)
The manual coherency hints are not implemented for DMABuf, and libcamera only do
dmabuf. Someone will have to look into that. This is also why this API have very
low adoption, its breaks easily.
>
> [1] https://git.linuxtv.org/media.git/tree/drivers/media/common/videobuf2/videobuf2-core.c?id=94794b5ce4d90ab134b0b101a02fddf6e74c437d#n411
> [2] https://git.linuxtv.org/media.git/tree/drivers/media/common/videobuf2/videobuf2-core.c?id=94794b5ce4d90ab134b0b101a02fddf6e74c437d#n829
> [3] https://git.linuxtv.org/media.git/tree/drivers/media/common/videobuf2/videobuf2-dma-contig.c?id=94794b5ce4d90ab134b0b101a02fddf6e74c437d#n426
>
> --
> Best regards,
> Mikhail Rudenko
>
Powered by blists - more mailing lists