lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <775b0f527f365fa4217a5d9acfbb80e4f87078ef.camel@ndufresne.ca>
Date: Thu, 10 Jul 2025 12:01:07 -0400
From: Nicolas Dufresne <nicolas@...fresne.ca>
To: Pavel Machek <pavel@....cz>, kraxel@...hat.com,
 vivek.kasireddy@...el.com, 	dri-devel@...ts.freedesktop.org,
 sumit.semwal@...aro.org, 	benjamin.gaignard@...labora.com,
 Brian.Starkey@....com, jstultz@...gle.com, 	tjmercier@...gle.com,
 linux-media@...r.kernel.org, 	linaro-mm-sig@...ts.linaro.org, kernel list
 <linux-kernel@...r.kernel.org>, 	laurent.pinchart@...asonboard.com,
 l.stach@...gutronix.de, 	linux+etnaviv@...linux.org.uk,
 christian.gmeiner@...il.com, 	etnaviv@...ts.freedesktop.org,
 phone-devel@...r.kernel.org
Subject: Re: DMA-BUFs always uncached on arm64, causing poor camera
 performance on Librem 5

Hi Pavel,

Le jeudi 10 juillet 2025 à 10:24 +0200, Pavel Machek a écrit :
> Hi!
> 
> It seems that DMA-BUFs are always uncached on arm64... which is a
> problem.
> 
> I'm trying to get useful camera support on Librem 5, and that includes
> recording vidos (and taking photos).
> 
> memcpy() from normal memory is about 2msec/1MB. Unfortunately, for
> DMA-BUFs it is 20msec/1MB, and that basically means I can't easily do
> 760p video recording. Plus, copying full-resolution photo buffer takes
> more than 200msec!
> 
> There's possibility to do some processing on GPU, and its implemented here:
> 
> https://gitlab.com/tui/tui/-/tree/master/icam?ref_type=heads
> 
> but that hits the same problem in the end -- data is in DMA-BUF,
> uncached, and takes way too long to copy out.
> 
> And that's ... wrong. DMA ended seconds ago, complete cache flush
> would be way cheaper than copying single frame out, and I still have
> to deal with uncached frames.
> 
> So I have two questions:
> 
> 1) Is my analysis correct that, no matter how I get frame from v4l and
> process it on GPU, I'll have to copy it from uncached memory in the
> end?
> 
> 2) Does anyone have patches / ideas / roadmap how to solve that? It
> makes GPU unusable for computing, and camera basically unusable for
> video.

If CPU access is strictly required for your use case, the way forward is to
implement V4L2_BUF_CAP_SUPPORTS_MMAP_CACHE_HINT in the capture driver. Very
little drivers enable that.

Once your driver have that capability, you will be able to set
V4L2_MEMORY_FLAG_NON_COHERENT while doing REQBUFS or CREATE_BUFS ioctl. That
gives you allocation with CPU cache working, but you'll get the invalidation (or
flush) overhead by default. When capture data have not been read by CPU, you can
always queue it back with the V4L2_BUF_FLAG_NO_CACHE_INVALIDATE. But for your
use case, it seems that you want the invalidation to take place, otherwise your
software will endup reading old cache data instead of the next frame data.

Please note that the integration in the DMABuf SYNC ioctl was missing for a
while, so make sure you have recent enough kernel or get ready for backports.
The feature itself was commonly used with CPU only access, notably on ChromeOS
using libyuv. No DMABuf was involved initially.

regards,

Nicolas

[0] https://www.kernel.org/doc/html/latest/userspace-api/media/v4l/vidioc-reqbufs.html#v4l2-buf-cap-supports-mmap-cache-hints

> 
> Best regards,
> 								Pavel

Download attachment "signature.asc" of type "application/pgp-signature" (229 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ