lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160111210506.GO652@nuc-i3427.alporthouse.com>
Date:	Mon, 11 Jan 2016 21:05:06 +0000
From:	Chris Wilson <chris@...is-wilson.co.uk>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Andy Lutomirski <luto@...capital.net>,
	"H. Peter Anvin" <hpa@...or.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Ross Zwisler <ross.zwisler@...ux.intel.com>,
	"H . Peter Anvin" <hpa@...ux.intel.com>,
	Borislav Petkov <bp@...en8.de>,
	Brian Gerst <brgerst@...il.com>,
	Denys Vlasenko <dvlasenk@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Imre Deak <imre.deak@...el.com>,
	Daniel Vetter <daniel.vetter@...ll.ch>,
	DRI <dri-devel@...ts.freedesktop.org>
Subject: Re: [PATCH] x86: Add an explicit barrier() to clflushopt()

On Mon, Jan 11, 2016 at 12:11:05PM -0800, Linus Torvalds wrote:
> On Mon, Jan 11, 2016 at 3:28 AM, Chris Wilson <chris@...is-wilson.co.uk> wrote:
> >
> > Bizarrely,
> >
> > diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
> > index 6000ad7..cf074400 100644
> > --- a/arch/x86/mm/pageattr.c
> > +++ b/arch/x86/mm/pageattr.c
> > @@ -141,6 +141,7 @@ void clflush_cache_range(void *vaddr, unsigned int size)
> >         for (; p < vend; p += clflush_size)
> >                 clflushopt(p);
> >
> > +       clflushopt(vend-1);
> >         mb();
> >  }
> >  EXPORT_SYMBOL_GPL(clflush_cache_range);
> >
> > works like a charm.
> 
> Have you checked all your callers? If the above makes a difference, it
> really sounds like the caller has passed in a size of zero, resulting
> in no cache flush, because the caller had incorrect ranges. The
> additional clflushopt now flushes the previous cacheline that wasn't
> flushed correctly before.
> 
> That "size was zero" thing would explain why changing the loop to "p
> <= vend" also fixes things for you.

This is on top of HPA's suggestion to do the size==0 check up front.
 
> IOW, just how sure are you that all the ranges are correct?

All our callers are of the pattern:

memcpy(dst, vaddr, size)
clflush_cache_range(dst, size)

or 

clflush_cache_range(vaddr, size)
memcpy(dst, vaddr, size)

I am resonably confident that the ranges are sane. I've tried to verify
that we do the clflushes by forcing them. However, if I clflush the
whole object instead of the cachelines around the copies, the tests pass.
(Flushing up to a couple of megabytes instead of a few hundred bytes, it
is hard to draw any conclusions about what the bug might be.)

I can narrow down the principal buggy path by doing the clflush(vend-1)
in the callers at least.

The problem is that the tests that fail are those looking for bugs in
the coherency code, which may just as well be caused by the GPU writing
into those ranges at the same time as the CPU trying to read them. I've
looked into timing and tried adding udelay()s or uncached mmio along the
suspect paths, but that didn't change the presentation - having a udelay
fix the issue is usually a good indicator of a GPU write that hasn't landed
before the CPU read.

The bug only affects a couple of recent non-coherent platforms, earlier
Atoms and older Core seem unaffacted. That may also mean that it is the
GPU flush instruction that changed between platforms and isn't working
(as we intended at least).

Thanks for everyone's help and suggestions,
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ