[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d8cfb08ac44523c9235a858a4bd78dcd297a62da.camel@collabora.com>
Date: Tue, 23 Apr 2024 09:47:39 -0400
From: Nicolas Dufresne <nicolas.dufresne@...labora.com>
To: Doug Anderson <dianders@...omium.org>
Cc: Tiffany Lin <tiffany.lin@...iatek.com>, Andrew-CT Chen
<andrew-ct.chen@...iatek.com>, Yunfei Dong <yunfei.dong@...iatek.com>,
Mauro Carvalho Chehab <mchehab@...nel.org>, Matthias Brugger
<matthias.bgg@...il.com>, AngeloGioacchino Del Regno
<angelogioacchino.delregno@...labora.com>, Wei-Shun Chang
<weishunc@...gle.com>, Hans Verkuil <hverkuil-cisco@...all.nl>,
Nícolas "F. R. A. Prado" <nfraprado@...labora.com>, Rob
Herring <robh@...nel.org>, linux-arm-kernel@...ts.infradead.org,
linux-kernel@...r.kernel.org, linux-media@...r.kernel.org,
linux-mediatek@...ts.infradead.org
Subject: Re: [PATCH] media: mediatek: vcodec: Alloc DMA memory with
DMA_ATTR_ALLOC_SINGLE_PAGES
Hey,
Le lundi 22 avril 2024 à 12:25 -0700, Doug Anderson a écrit :
> Hi,
>
> On Mon, Apr 22, 2024 at 11:27 AM Nicolas Dufresne
> <nicolas.dufresne@...labora.com> wrote:
> >
> > Hi,
> >
> > Le lundi 22 avril 2024 à 10:03 -0700, Douglas Anderson a écrit :
> > > As talked about in commit 14d3ae2efeed ("ARM: 8507/1: dma-mapping: Use
> > > DMA_ATTR_ALLOC_SINGLE_PAGES hint to optimize alloc"), it doesn't
> > > really make sense to try to allocate contiguous chunks of memory for
> > > video encoding/decoding. Let's switch the Mediatek vcodec driver to
> > > pass DMA_ATTR_ALLOC_SINGLE_PAGES and take some of the stress off the
> > > memory subsystem.
> > >
> > > Signed-off-by: Douglas Anderson <dianders@...omium.org>
> > > ---
> > > NOTE: I haven't personally done massive amounts of testing with this
> > > change, but I originally added the DMA_ATTR_ALLOC_SINGLE_PAGES flag
> > > specifically for the video encoding / decoding cases and I know it
> > > helped avoid memory problems in the past on other systems. Colleagues
> > > of mine have told me that with this change memory problems are harder
> > > to reproduce, so it seems like we should consider doing it.
> >
> > One thing to improve in your patch submission is to avoid abstracting the
> > problems. Patch review and pulling is based on a technical rational and very
> > rarely on the trust that it helps someone somewhere in some unknown context.
> > What kind of memory issues are you facing ? What is the technical advantage of
> > using DMA_ATTR_ALLOC_SINGLE_PAGES over the current approach that helps fixing
> > the issue? I do expect this to be documented in the commit message itselfé.
>
> Right. The problem here is that I'm not _directly_ facing any problems
> here and I also haven't done massive amounts of analysis of the
> Mediatek video codec. I know that some of my colleagues have run into
> issues on Mediatek devices where the system starts getting
> unresponsive when lots of videos are decoded in parallel. That
> reminded me of the old problem I debugged in 2015 on Rockchip
> platforms and is talked about a bunch in the referenced commit
> 14d3ae2efeed ("ARM: 8507/1: dma-mapping: Use
> DMA_ATTR_ALLOC_SINGLE_PAGES hint to optimize alloc") so I wrote up
> this patch. The referenced commit contains quite a bit of details
> about the problems faced back in 2015.
>
> When I asked, my colleagues said that my patch seemed to help, though
> it was more of a qualitative statement than a quantitative one.
>
> I wasn't 100% sure if it was worth sending the patch up at this point,
> but logically, I think it makes sense. There aren't great reasons to
> hog all the large chunks of memory for video decoding.
Ok, slowly started retracing this 2016 effort (which now I understand you where
deeply involved in). Its pretty clear this hint is only used for codecs. One
thing the explanation seems missing (or that I missed) is that all the enabled
drivers seems to come with a dedicated mmu (dedicated TLB). But this argument
seems void if it is not combined with DMA_ATTR_NO_KERNEL_MAPPING to avoid using
the main mmu TLB space. There is currently three drivers using S5P_MFC, Hantro
and RKVDEC that uses this hint, only Hantro sets the DMA_ATTR_NO_KERNEL_MAPPING
hint.
It would be nice to check if VCODEC needs kernel mapping on the RAW images, and
introduce that hint too while introducing DMA_ATTR_ALLOC_SINGLE_PAGES. But with
a now proper understanding, I also feel like this is wanted , but I'll have a
hard time thinking of a test that shows the performance gain, since it requires
specific level of fragmentation in the system to make a difference.
Another aspect of the original description that is off, is CODECs doing linear
access, while this is mostly true for reconstruction (writes), this is not true
for prediction (reads). What really matters is that the CODECs tiles are most of
the time no bigger then a page, or less then a handful of pages.
Nicolas
Powered by blists - more mailing lists