lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YCuPhOT4GhY3RR/6@phenom.ffwll.local>
Date:   Tue, 16 Feb 2021 10:25:24 +0100
From:   Daniel Vetter <daniel@...ll.ch>
To:     Christian König <christian.koenig@....com>
Cc:     linux-media <linux-media@...r.kernel.org>,
        dri-devel <dri-devel@...ts.freedesktop.org>,
        linaro-mm-sig@...ts.linaro.org,
        lkml <linux-kernel@...r.kernel.org>,
        Sumit Semwal <sumit.semwal@...aro.org>,
        Daniel Vetter <daniel@...ll.ch>,
        "Sharma, Shashank" <Shashank.Sharma@....com>
Subject: Re: DMA-buf and uncached system memory

On Mon, Feb 15, 2021 at 09:58:08AM +0100, Christian König wrote:
> Hi guys,
> 
> we are currently working an Freesync and direct scan out from system memory
> on AMD APUs in A+A laptops.
> 
> On problem we stumbled over is that our display hardware needs to scan out
> from uncached system memory and we currently don't have a way to communicate
> that through DMA-buf.
> 
> For our specific use case at hand we are going to implement something driver
> specific, but the question is should we have something more generic for
> this?
> 
> After all the system memory access pattern is a PCIe extension and as such
> something generic.

Yes it's a problem, and it's a complete mess. So the defacto rules are:

1. importer has to do coherent transactions to snoop cpu caches.

This way both modes work:

- if the buffer is cached, we're fine

- if the buffer is not cached, but the exporter has flushed all the
  caches, you're mostly just wasting time on inefficient bus cycles. Also
  this doesn't work if your CPU doesn't just drop clean cachelines. Like I
  thing some ARM are prone to do, not idea about AMD, Intel is guaranteed
  to drop them which is why the uncached scanout for integrated Intel +
  amd discrete "works".

2. exporters picks the mode freely, and can even change it at runtime
(i915 does this, since we don't have an "allocate for scanout" flag wired
through consistently). This doesn't work on arm, there the rule is "all
devices in the same system must use the same mode".

3. This should be solved at the dma-buf layer, but the dma-api refuses to
tell you this information (at least for dma_alloc_coherent). And I'm not
going to deal with the bikeshed that would bring into my inbox. Or at
least there's always been screaming that drivers shouldn't peek behind the
abstraction.

So I think if AMD also guarantees to drop clean cachelines just do the
same thing we do right now for intel integrated + discrete amd, but in
reserve. It's fragile, but it does work.

What we imo shouldn't do is driver private interfaces here, that's just
going to make the problem worse long term. Or at least driver-private
interfaces that spawn across drivers behind dma-buf, because imo this is
really a problem that dma-buf should solve.

If you do want to solve this at the dma-buf level I can try and point you
at the respective i915 and amdgpu code that makes the magic work - I've
had to fix it a few times in the past. I'm not sure whether we'd need to
pass the dynamic nature through though, i.e. whether we want to be able to
scan out imported dma-buf and hence request they be used in uncached mode.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ