lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 12 Dec 2018 12:29:34 -0600
From:   Josh Poimboeuf <jpoimboe@...hat.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Andrew Lutomirski <luto@...nel.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Peter Zijlstra <peterz@...radead.org>,
        the arch/x86 maintainers <x86@...nel.org>,
        Linux List Kernel Mailing <linux-kernel@...r.kernel.org>,
        Ard Biesheuvel <ard.biesheuvel@...aro.org>,
        Ingo Molnar <mingo@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>, mhiramat@...nel.org,
        jbaron@...mai.com, Jiri Kosina <jkosina@...e.cz>,
        David.Laight@...lab.com, bp@...en8.de, julia@...com,
        jeyu@...nel.org, Peter Anvin <hpa@...or.com>
Subject: Re: [PATCH v2 4/4] x86/static_call: Add inline static call
 implementation for x86-64

On Thu, Nov 29, 2018 at 03:04:20PM -0800, Linus Torvalds wrote:
> On Thu, Nov 29, 2018 at 12:25 PM Josh Poimboeuf <jpoimboe@...hat.com> wrote:
> >
> > On Thu, Nov 29, 2018 at 11:27:00AM -0800, Andy Lutomirski wrote:
> > >
> > > I propose a different solution:
> > >
> > > As in this patch set, we have a direct and an indirect version.  The
> > > indirect version remains exactly the same as in this patch set.  The
> > > direct version just only does the patching when all seems well: the
> > > call instruction needs to be 0xe8, and we only do it when the thing
> > > doesn't cross a cache line.  Does that work?  In the rare case where
> > > the compiler generates something other than 0xe8 or crosses a cache
> > > line, then the thing just remains as a call to the out of line jmp
> > > trampoline.  Does that seem reasonable?  It's a very minor change to
> > > the patch set.
> >
> > Maybe that would be ok.  If my math is right, we would use the
> > out-of-line version almost 5% of the time due to cache misalignment of
> > the address.
> 
> Note that I don't think cache-line alignment is necessarily sufficient.
> 
> The I$ fetch from the cacheline can happen in smaller chunks, because
> the bus between the I$ and the instruction decode isn't a full
> cacheline (well, it is _now_ in modern big cores, but it hasn't always
> been).
> 
> So even if the cacheline is updated atomically, I could imagine seeing
> a partial fetch from the I$ (old values) and then a second partial
> fetch (new values).
> 
> It would be interesting to know what the exact fetch rules are.

So I fixed my test case to do 32-bit writes, and now the results are
making a lot more sense.  Now I only get crashes when writing across
cache lines.  So maybe we should just go with Andy's suggestion above.

It would be great if some CPU people could confirm that it's safe (for
x86-64 only), since it's not in the SDM.  Who can help answer that?

-- 
Josh

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ