lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sun, 6 Nov 2011 23:19:09 +0100
From:	Daniel Vetter <daniel@...ll.ch>
To:	Chris Wilson <chris@...is-wilson.co.uk>
Cc:	Daniel Vetter <daniel.vetter@...ll.ch>,
	intel-gfx <intel-gfx@...ts.freedesktop.org>,
	linux-kernel@...r.kernel.org, dri-devel@...ts.freedesktop.org
Subject: Re: [PATCH 09/13] drm/i915: don't use gtt_pwrite on LLC cached
 objects

On Sun, Nov 06, 2011 at 09:16:00PM +0000, Chris Wilson wrote:
> On Sun,  6 Nov 2011 20:13:56 +0100, Daniel Vetter <daniel.vetter@...ll.ch> wrote:
> > ~120 µs instead fo ~210 µs to write 1mb on my snb. I like this.
> > 
> > Signed-off-by: Daniel Vetter <daniel.vetter@...ll.ch>
> > ---
> >  drivers/gpu/drm/i915/i915_gem.c |    1 +
> >  1 files changed, 1 insertions(+), 0 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 0048917..8fd175c 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -842,6 +842,7 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
> >  		ret = i915_gem_phys_pwrite(dev, obj, args, file);
> >  		goto out;
> >  	} else if (obj->gtt_space &&
> > +		   obj->cache_level == I915_CACHE_NONE &&
> >  		   obj->base.write_domain != I915_GEM_DOMAIN_CPU) {
> >  		ret = i915_gem_object_pin(obj, 0, true);
> >  		if (ret)
> 
> I still think you want to include a obj->map_and_fenceable test here.
> When doing 2D benchmarks the stall incurred here to evict an old object
> map the to-be-written object into the mappable GTT causes measureable
> pain (obviously on non-LLC architectures).

That's one of "further tricks". I think we need to also implement the same
in-place clflush trick like for pread, too, to avoid penalizing partial
pwrites too much.

The other trick is to do reloc fixups through llc/clflushed cpu writes.
This way we'd completely eliminate mappable pressure for all untiled
objects. The only thing left would be scanout, tiled gtt uploads and tiled
blts (only on pre-gen4).
-Daniel
-- 
Daniel Vetter
Mail: daniel@...ll.ch
Mobile: +41 (0)79 365 57 48
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ