linux-kernel - Re: [PATCH] x86: Add an explicit barrier() to clflushopt()

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160112163757.GG24672@nuc-i3427.alporthouse.com>
Date:	Tue, 12 Jan 2016 16:37:57 +0000
From:	Chris Wilson <chris@...is-wilson.co.uk>
To:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andy Lutomirski <luto@...capital.net>,
	"H. Peter Anvin" <hpa@...or.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Ross Zwisler <ross.zwisler@...ux.intel.com>,
	"H . Peter Anvin" <hpa@...ux.intel.com>,
	Borislav Petkov <bp@...en8.de>,
	Brian Gerst <brgerst@...il.com>,
	Denys Vlasenko <dvlasenk@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Imre Deak <imre.deak@...el.com>,
	Daniel Vetter <daniel.vetter@...ll.ch>,
	DRI <dri-devel@...ts.freedesktop.org>
Subject: Re: [PATCH] x86: Add an explicit barrier() to clflushopt()

On Mon, Jan 11, 2016 at 09:05:06PM +0000, Chris Wilson wrote:
> I can narrow down the principal buggy path by doing the clflush(vend-1)
> in the callers at least.

That leads to the suspect path being a read back of a cache line from
main memory that was just written to by the GPU. Writes to memory before
using them on the GPU do not seem to be affected (or at least we have
sufficient flushing in sending the commands to the GPU that we don't
notice anything wrong).

And back to the oddity.

Instead of doing:

	clflush_cache_range(vaddr + offset, size);
	clflush(vaddr+offset+size-1);
	mb();
	memcpy(user, vaddr+offset, size);

what also worked was:

	clflush_cache_range(vaddr + offset, size);
	clflush(vaddr);
	mb();
	memcpy(user, vaddr+offset, size);

(size is definitely non-zero, offset is offset_in_page(), vaddr is from
kmap_atomic()).

i.e.

void clflush_cache_range(void *vaddr, unsigned int size)
{
        const unsigned long clflush_size = boot_cpu_data.x86_clflush_size;
        void *p = (void *)((unsigned long)vaddr & ~(clflush_size - 1));
        void *vend = vaddr + size;

        if (p >= vend)
                return;

        mb();

        for (; p < vend; p += clflush_size)
                clflushopt(p);

	clflushopt(vaddr);

        mb();
}

I have also confirmed that this doesn't just happen for single
cachelines (i.e. where the earlier clflush(vend-1) and this clflush(vaddr)
would be equivalent).

At the moment I am more inclined this is serialising the clflush()
(since clflush to the same cacheline is regarded as ordered with respect
to the earlier clflush iirc) as opposed to the writes not landing timely
from the GPU.

Am I completely going mad?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre