lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Fri, 11 Jul 2014 11:53:06 +0200
From:	Lucas Stach <l.stach@...gutronix.de>
To:	Alexandre Courbot <acourbot@...dia.com>
Cc:	Ben Skeggs <skeggsb@...il.com>,
	Alexandre Courbot <gnurou@...il.com>,
	"nouveau@...ts.freedesktop.org" <nouveau@...ts.freedesktop.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"dri-devel@...ts.freedesktop.org" <dri-devel@...ts.freedesktop.org>,
	Ben Skeggs <bskeggs@...hat.com>,
	"linux-tegra@...r.kernel.org" <linux-tegra@...r.kernel.org>
Subject: Re: [Nouveau] [PATCH v4 2/6] drm/nouveau: map pages using DMA API
 on platform devices

Am Freitag, den 11.07.2014, 11:57 +0900 schrieb Alexandre Courbot:
[...]
> >> Yeah, I am not familiar with i915 but it seems like we are on a similar boat
> >> here (excepted ARM is more constrained as to its memory mappings). The
> >> strategy in this series is, map buffers used by user-space cached and
> >> explicitly synchronize them (since the ownership transition from user to GPU
> >> is always clearly performed by syscalls), and use coherent mappings for
> >> buffers used by the kernel which are accessed more randomly. This has solved
> >> all our coherency issues and resulted in the best performance so far.
> > I wonder if we might want to use unsnooped cached mappings of pages on
> > non-ARM platforms also, to avoid the overhead of the cache snooping?
> 
> You might want to indeed, now that coherency is guaranteed by the sync 
> functions originally introduced by Lucas. The only issue I could see is 
> that they always invalidate the full buffer whereas bus snooping only 
> affects pages that are actually touched. Someone would need to try this 
> on a desktop machine and see how it affects performance.
> 
> I'd be all for it though, since it would also allow us to get rid of 
> this ungraceful nv_device_is_cpu_coherent() function and result in 
> simplifying nouveau_bo.c a bit.

This will need some testing to get hard numbers, but I suspect that
invalidating the whole buffer isn't to bad as the prefetch machinery
works very well with the access patterns we see in graphics drivers.

Flushing out the whole buffer should be even less problematic, as it
will only flush out dirty lines that would need to be flushed on GPU
read snooping anyways.

In the long run we might want a separate cpu prepare/finish ioctl where
we can indicate the area of interest. This might help to avoid some of
the invalidate overhead especially for userspace suballocated buffers.

Regards,
Lucas

-- 
Pengutronix e.K.             | Lucas Stach                 |
Industrial Linux Solutions   | http://www.pengutronix.de/  |

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ