lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 13 Jan 2016 12:34:38 +0000
From:	Chris Wilson <chris@...is-wilson.co.uk>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Andy Lutomirski <luto@...capital.net>,
	"H. Peter Anvin" <hpa@...or.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Ross Zwisler <ross.zwisler@...ux.intel.com>,
	"H . Peter Anvin" <hpa@...ux.intel.com>,
	Borislav Petkov <bp@...en8.de>,
	Brian Gerst <brgerst@...il.com>,
	Denys Vlasenko <dvlasenk@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Imre Deak <imre.deak@...el.com>,
	Daniel Vetter <daniel.vetter@...ll.ch>,
	DRI <dri-devel@...ts.freedesktop.org>
Subject: Re: [PATCH] x86: Add an explicit barrier() to clflushopt()

On Tue, Jan 12, 2016 at 06:06:34PM -0800, Linus Torvalds wrote:
> On Tue, Jan 12, 2016 at 4:55 PM, Chris Wilson <chris@...is-wilson.co.uk> wrote:
> >
> > The double clflush() remains a mystery.
> 
> Actually, I think it's explainable.
> 
> It's wrong to do the clflush *after* the GPU has done the write, which
> seems to be what you are doing.
> 
> Why?
> 
> If the GPU really isn't cache coherent, what can happen is:
> 
>  - the CPU has the line cached
> 
>  - the GPU writes the data
> 
>  - you do the clflushopt to invalidate the cacheline
> 
>  - you expect to see the GPU data.
> 
> Right?
> 
> Wrong. The above is complete crap.
> 
> Why?
> 
> Very simple reason: the CPU may have had the cacheline dirty at some
> level in its caches, so when you did the clflushopt, it didn't just
> invalidate the CPU cacheline, it wrote it back to memory. And in the
> process over-wrote the data that the GPU had written.

Forgive me for being dense, but if we overwrite the GPU data in the
backing struct page with the cacheline from the CPU, how do we see the
results from the GPU afterwards?

> Now you can say "but the CPU never wrote to the cacheline, so it's not
> dirty in the CPU caches". That may or may not be trie. The CPU may
> have written to it quite a long time ago.

What we do is we clflush the entire object after it is written to by the
CPU (including newly allocated objects from shmemfs, or pages being
returned to us by shmemfs) before any operation on that object by the
GPU. We have to so that the GPU sees the correct page contents.
(If I change the clflush of the written objects to a wbinvd, that's not
sufficient for the tests to pass.)

We do not clflush the object after we read the backing pages on the CPU
before the next GPU operation, even if it is a GPU write. This leaves us
with clean but stale cachelines. Should. That's why we then
clflush_cache_range() prior to the next read on the object, it is
intended to be a pure cache line invalidation.

If we clflush the entire object between every CPU read back and the *next*
GPU operation, it fails. If we clflush the object before every GPU write
to it, it passes. And to refresh, we always clflush after a CPU write.

I am reasonably confident that any cachelines we dirty (or inherited)
are flushed. What you are suggesting is that there are dirty cachelines
regardless. I am also reasonably confident that even if we clflush the
entire object after touching it before the GPU write, and clflush the
individual cachelines again after the GPU write, we see the errors.

I haven't found the hole yet, or been convincingly able to explain the
differences between gen.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ