[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <87tt3fdfpg.fsf@gmail.com>
Date: Sun, 13 Jul 2025 22:54:14 +0300
From: Mikhail Rudenko <mike.rudenko@...il.com>
To: Pavel Machek <pavel@....cz>
Cc: kraxel@...hat.com, vivek.kasireddy@...el.com,
dri-devel@...ts.freedesktop.org, sumit.semwal@...aro.org,
benjamin.gaignard@...labora.com, Brian.Starkey@....com,
jstultz@...gle.com, tjmercier@...gle.com, linux-media@...r.kernel.org,
linaro-mm-sig@...ts.linaro.org, kernel list
<linux-kernel@...r.kernel.org>, laurent.pinchart@...asonboard.com,
l.stach@...gutronix.de, linux+etnaviv@...linux.org.uk,
christian.gmeiner@...il.com, etnaviv@...ts.freedesktop.org,
phone-devel@...r.kernel.org
Subject: Re: DMA-BUFs always uncached on arm64, causing poor camera
performance on Librem 5
Hi, Pavel,
On 2025-07-10 at 10:24 +02, Pavel Machek <pavel@....cz> wrote:
> [[PGP Signed Part:Undecided]]
> Hi!
>
> It seems that DMA-BUFs are always uncached on arm64... which is a
> problem.
>
> I'm trying to get useful camera support on Librem 5, and that includes
> recording vidos (and taking photos).
Earlier this year i tried to solve a similar issue on rkisp1 (Rockchip
3399), and done some measurements, showing that non-coherent buffers +
cache flushing for buffers is a viable approach [1]. Unfortunately, that
effort stalled, but maybe patch "[PATCH v4 1/2] media: videobuf2: Fix
dmabuf cache sync/flush in dma-contig" will be useful to you.
[1] https://lore.kernel.org/all/20250303-b4-rkisp-noncoherent-v4-0-e32e843fb6ef@gmail.com/
> memcpy() from normal memory is about 2msec/1MB. Unfortunately, for
> DMA-BUFs it is 20msec/1MB, and that basically means I can't easily do
> 760p video recording. Plus, copying full-resolution photo buffer takes
> more than 200msec!
>
> There's possibility to do some processing on GPU, and its implemented here:
>
> https://gitlab.com/tui/tui/-/tree/master/icam?ref_type=heads
>
> but that hits the same problem in the end -- data is in DMA-BUF,
> uncached, and takes way too long to copy out.
>
> And that's ... wrong. DMA ended seconds ago, complete cache flush
> would be way cheaper than copying single frame out, and I still have
> to deal with uncached frames.
>
> So I have two questions:
>
> 1) Is my analysis correct that, no matter how I get frame from v4l and
> process it on GPU, I'll have to copy it from uncached memory in the
> end?
>
> 2) Does anyone have patches / ideas / roadmap how to solve that? It
> makes GPU unusable for computing, and camera basically unusable for
> video.
>
> Best regards,
> Pavel
--
Best regards,
Mikhail Rudenko
Powered by blists - more mailing lists