lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Thu, 22 Mar 2012 00:29:45 +0100
From:	Daniel Vetter <daniel@...ll.ch>
To:	Rebecca Schultz Zavin <rebecca@...roid.com>
Cc:	Rob Clark <rob.clark@...aro.org>, linaro-mm-sig@...ts.linaro.org,
	LKML <linux-kernel@...r.kernel.org>,
	DRI Development <dri-devel@...ts.freedesktop.org>,
	linux-media@...r.kernel.org, Daniel Vetter <daniel.vetter@...ll.ch>
Subject: Re: [Linaro-mm-sig] [PATCH] [RFC] dma-buf: mmap support

On Wed, Mar 21, 2012 at 03:44:38PM -0700, Rebecca Schultz Zavin wrote:
> Couldn't this just as easily be handled by not having those mappings
> be mapped cached or write combine to userspace?  They'd be coherent,
> just slow.  I'm not sure we can actually say that all these cpu access
> are necessary slow path operations anyway.  On android we do sometimes
> decide to software render things to eliminate the overhead of
> maintaining a hardware context for context switching the gpu.   If you
> want cached or writecombine mappings you'd have to manage them
> explicitly.  If you can't manage them explicitly you have to settle
> for slow.  That seems reasonable to me.

Well the usual approach is writecombine, which doesn't need any explicit
cache management. 

> As far as I can tell with explicit operations I have to invalidate
> before touching from mmap and clean after.  With these implicit ones,
> I stil have to invalidate and clean, but now I also have to remap them
> before and after.  I don't know what the performance hit of this
> remapping step is, but I'd like to if you have any insight.

We have a few inefficiencies in the drm/i915 fault path which makes it
slow, but generally pagefault performance should be rather quick (at least
quicker than flushing the actual data). At least if your fault handler is
somewhat clever and prefaults a few more pages in both x and y direction.

But if that's too slow, I'm open to extending dma-buf later on to support
more explicit cache management for userspace mmaps (like I've explained
below in my previous mail). I just think we should have real benchmark
results (and hence some real users of dma-buf) before we add this
complexity. Atm I have no idea whether it's worth it. After all, as soon
as we expect a lot of rendering/processing, some special dsp/gpu/whatever
is likely to take over.

> > Imo the best way to enable cached mappings is to later on extend dma-buf
> > (as soon as we have some actual exporters/importers in the mainline
> > kernel) with an optional cached_mmap interface which requires explict
> > prepare_mmap_access/finish_mmap_acces calls. Then if both exporter and
> > importer support this, it could get used - otherwise the dma-buf layer
> > could transparently fall back to coherent mappings.

Yours, Daniel
-- 
Daniel Vetter
Mail: daniel@...ll.ch
Mobile: +41 (0)79 365 57 48
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ