lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250710215242.GA16271@pendragon.ideasonboard.com>
Date: Fri, 11 Jul 2025 00:52:42 +0300
From: Laurent Pinchart <laurent.pinchart@...asonboard.com>
To: Pavel Machek <pavel@....cz>
Cc: Lucas Stach <l.stach@...gutronix.de>, kraxel@...hat.com,
	vivek.kasireddy@...el.com, dri-devel@...ts.freedesktop.org,
	sumit.semwal@...aro.org, benjamin.gaignard@...labora.com,
	Brian.Starkey@....com, jstultz@...gle.com, tjmercier@...gle.com,
	linux-media@...r.kernel.org, linaro-mm-sig@...ts.linaro.org,
	kernel list <linux-kernel@...r.kernel.org>,
	linux+etnaviv@...linux.org.uk, christian.gmeiner@...il.com,
	etnaviv@...ts.freedesktop.org, phone-devel@...r.kernel.org
Subject: Re: DMA-BUFs always uncached on arm64, causing poor camera
 performance on Librem 5

On Thu, Jul 10, 2025 at 10:49:19AM +0200, Pavel Machek wrote:
> Hi!
> 
> > > memcpy() from normal memory is about 2msec/1MB. Unfortunately, for
> > > DMA-BUFs it is 20msec/1MB, and that basically means I can't easily do
> > > 760p video recording. Plus, copying full-resolution photo buffer takes
> > > more than 200msec!
> > > 
> > > There's possibility to do some processing on GPU, and its implemented here:
> > > 
> > > https://gitlab.com/tui/tui/-/tree/master/icam?ref_type=heads
> > > 
> > > but that hits the same problem in the end -- data is in DMA-BUF,
> > > uncached, and takes way too long to copy out.
> > > 
> > > And that's ... wrong. DMA ended seconds ago, complete cache flush
> > > would be way cheaper than copying single frame out, and I still have
> > > to deal with uncached frames.
> > > 
> > > So I have two questions:
> > > 
> > > 1) Is my analysis correct that, no matter how I get frame from v4l and
> > > process it on GPU, I'll have to copy it from uncached memory in the
> > > end?
> > 
> > If you need to touch the buffers using the CPU then you are either
> > stuck with uncached memory or you need to implement bracketed access to
> > do the necessary cache maintenance. Be aware that completely flushing
> > the cache is not really an option, as that would impact other
> > workloads, so you have to flush the cache by walking the virtual
> > address space of the buffer, which may take a significant amount of CPU
> > time.
> 
> What kind of "significant amount of CPU time" are we talking here?
> Millisecond?

It really depends on the platform, the type of cache, and the size of
the buffer. I remember that back in the N900 days a selective cash clean
of a large buffer for full resolution images took several dozens of
milliseconds, possibly close to 100ms. We had to clean the whole D-cache
to make it fast enough, but you can't always do that as Lucas mentioned.

> Bracketed access is fine with me.
> 
> Flushing a cache should be an option. I'm root, there's no other
> significant workload, and copying out the buffer takes 200msec+. There
> are lot of cache flushes that can be done in quarter a second!
> 
> > However, if you are only going to use the buffer with the GPU I see no
> > reason to touch it from the CPU side. Why would you even need to copy
> > the content? After all dma-bufs are meant to enable zero-copy between
> > DMA capable accelerators. You can simply import the V4L2 buffer into a
> > GL texture using EGL_EXT_image_dma_buf_import. Using this path you
> > don't need to bother with the cache at all, as the GPU will directly
> > read the video buffers from RAM.
> 
> Yes, so GPU will read video buffer from RAM, then debayer it, and then
> what? Then I need to store a data into raw file, or use CPU to turn it
> into JPEG file, or maybe run video encoder on it. That are all tasks
> that are done on CPU...

-- 
Regards,

Laurent Pinchart

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ