[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8d23f1ca6fe76d8971365bf54ca71ba71698980d.camel@pengutronix.de>
Date: Mon, 15 Feb 2021 12:53:32 +0100
From: Lucas Stach <l.stach@...gutronix.de>
To: Christian König <christian.koenig@....com>,
Simon Ser <contact@...rsion.fr>
Cc: linaro-mm-sig@...ts.linaro.org,
"Sharma, Shashank" <Shashank.Sharma@....com>,
lkml <linux-kernel@...r.kernel.org>,
dri-devel <dri-devel@...ts.freedesktop.org>,
linux-media <linux-media@...r.kernel.org>
Subject: Re: DMA-buf and uncached system memory
Am Montag, dem 15.02.2021 um 10:34 +0100 schrieb Christian König:
>
> Am 15.02.21 um 10:06 schrieb Simon Ser:
> > On Monday, February 15th, 2021 at 9:58 AM, Christian König <christian.koenig@....com> wrote:
> >
> > > we are currently working an Freesync and direct scan out from system
> > > memory on AMD APUs in A+A laptops.
> > >
> > > On problem we stumbled over is that our display hardware needs to scan
> > > out from uncached system memory and we currently don't have a way to
> > > communicate that through DMA-buf.
> > >
> > > For our specific use case at hand we are going to implement something
> > > driver specific, but the question is should we have something more
> > > generic for this?
> > >
> > > After all the system memory access pattern is a PCIe extension and as
> > > such something generic.
> > Intel also needs uncached system memory if I'm not mistaken?
>
> No idea, that's why I'm asking. Could be that this is also interesting
> for I+A systems.
>
> > Where are the buffers allocated? If GBM, then it needs to allocate memory that
> > can be scanned out if the USE_SCANOUT flag is set or if a scanout-capable
> > modifier is picked.
> >
> > If this is about communicating buffer constraints between different components
> > of the stack, there were a few proposals about it. The most recent one is [1].
>
> Well the problem here is on a different level of the stack.
>
> See resolution, pitch etc:.. can easily communicated in userspace
> without involvement of the kernel. The worst thing which can happen is
> that you draw garbage into your own application window.
>
> But if you get the caching attributes in the page tables (both CPU as
> well as IOMMU, device etc...) wrong then ARM for example has the
> tendency to just spontaneously reboot
>
> X86 is fortunately a bit more gracefully and you only end up with random
> data corruption, but that is only marginally better.
>
> So to sum it up that is not something which we can leave in the hands of
> userspace.
>
> I think that exporters in the DMA-buf framework should have the ability
> to tell importers if the system memory snooping is necessary or not.
There is already a coarse-grained way to do so: the dma_coherent
property in struct device, which you can check at dmabuf attach time.
However it may not be enough for the requirements of a GPU where the
engines could differ in their dma coherency requirements. For that you
need to either have fake struct devices for the individual engines or
come up with a more fine-grained way to communicate those requirements.
> Userspace components can then of course tell the exporter what the
> importer needs, but validation if that stuff is correct and doesn't
> crash the system must happen in the kernel.
What exactly do you mean by "scanout requires non-coherent memory"?
Does the scanout requestor always set the no-snoop PCI flag, so you get
garbage if some writes to memory are still stuck in the caches, or is
it some other requirement?
Regards,
Lucas
Powered by blists - more mailing lists