[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20120321232945.GC20712@phenom.ffwll.local>
Date: Thu, 22 Mar 2012 00:29:45 +0100
From: Daniel Vetter <daniel@...ll.ch>
To: Rebecca Schultz Zavin <rebecca@...roid.com>
Cc: Rob Clark <rob.clark@...aro.org>, linaro-mm-sig@...ts.linaro.org,
LKML <linux-kernel@...r.kernel.org>,
DRI Development <dri-devel@...ts.freedesktop.org>,
linux-media@...r.kernel.org, Daniel Vetter <daniel.vetter@...ll.ch>
Subject: Re: [Linaro-mm-sig] [PATCH] [RFC] dma-buf: mmap support
On Wed, Mar 21, 2012 at 03:44:38PM -0700, Rebecca Schultz Zavin wrote:
> Couldn't this just as easily be handled by not having those mappings
> be mapped cached or write combine to userspace? They'd be coherent,
> just slow. I'm not sure we can actually say that all these cpu access
> are necessary slow path operations anyway. On android we do sometimes
> decide to software render things to eliminate the overhead of
> maintaining a hardware context for context switching the gpu. If you
> want cached or writecombine mappings you'd have to manage them
> explicitly. If you can't manage them explicitly you have to settle
> for slow. That seems reasonable to me.
Well the usual approach is writecombine, which doesn't need any explicit
cache management.
> As far as I can tell with explicit operations I have to invalidate
> before touching from mmap and clean after. With these implicit ones,
> I stil have to invalidate and clean, but now I also have to remap them
> before and after. I don't know what the performance hit of this
> remapping step is, but I'd like to if you have any insight.
We have a few inefficiencies in the drm/i915 fault path which makes it
slow, but generally pagefault performance should be rather quick (at least
quicker than flushing the actual data). At least if your fault handler is
somewhat clever and prefaults a few more pages in both x and y direction.
But if that's too slow, I'm open to extending dma-buf later on to support
more explicit cache management for userspace mmaps (like I've explained
below in my previous mail). I just think we should have real benchmark
results (and hence some real users of dma-buf) before we add this
complexity. Atm I have no idea whether it's worth it. After all, as soon
as we expect a lot of rendering/processing, some special dsp/gpu/whatever
is likely to take over.
> > Imo the best way to enable cached mappings is to later on extend dma-buf
> > (as soon as we have some actual exporters/importers in the mainline
> > kernel) with an optional cached_mmap interface which requires explict
> > prepare_mmap_access/finish_mmap_acces calls. Then if both exporter and
> > importer support this, it could get used - otherwise the dma-buf layer
> > could transparently fall back to coherent mappings.
Yours, Daniel
--
Daniel Vetter
Mail: daniel@...ll.ch
Mobile: +41 (0)79 365 57 48
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists