linux-kernel - Re: [PATCH v2 4/4] x86/static_call: Add inline static call implementation for x86-64

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Wed, 12 Dec 2018 12:29:34 -0600
From:   Josh Poimboeuf <jpoimboe@...hat.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Andrew Lutomirski <luto@...nel.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Peter Zijlstra <peterz@...radead.org>,
        the arch/x86 maintainers <x86@...nel.org>,
        Linux List Kernel Mailing <linux-kernel@...r.kernel.org>,
        Ard Biesheuvel <ard.biesheuvel@...aro.org>,
        Ingo Molnar <mingo@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>, mhiramat@...nel.org,
        jbaron@...mai.com, Jiri Kosina <jkosina@...e.cz>,
        David.Laight@...lab.com, bp@...en8.de, julia@...com,
        jeyu@...nel.org, Peter Anvin <hpa@...or.com>
Subject: Re: [PATCH v2 4/4] x86/static_call: Add inline static call
 implementation for x86-64

On Thu, Nov 29, 2018 at 03:04:20PM -0800, Linus Torvalds wrote:
> On Thu, Nov 29, 2018 at 12:25 PM Josh Poimboeuf <jpoimboe@...hat.com> wrote:
> >
> > On Thu, Nov 29, 2018 at 11:27:00AM -0800, Andy Lutomirski wrote:
> > >
> > > I propose a different solution:
> > >
> > > As in this patch set, we have a direct and an indirect version.  The
> > > indirect version remains exactly the same as in this patch set.  The
> > > direct version just only does the patching when all seems well: the
> > > call instruction needs to be 0xe8, and we only do it when the thing
> > > doesn't cross a cache line.  Does that work?  In the rare case where
> > > the compiler generates something other than 0xe8 or crosses a cache
> > > line, then the thing just remains as a call to the out of line jmp
> > > trampoline.  Does that seem reasonable?  It's a very minor change to
> > > the patch set.
> >
> > Maybe that would be ok.  If my math is right, we would use the
> > out-of-line version almost 5% of the time due to cache misalignment of
> > the address.
> 
> Note that I don't think cache-line alignment is necessarily sufficient.
> 
> The I$ fetch from the cacheline can happen in smaller chunks, because
> the bus between the I$ and the instruction decode isn't a full
> cacheline (well, it is _now_ in modern big cores, but it hasn't always
> been).
> 
> So even if the cacheline is updated atomically, I could imagine seeing
> a partial fetch from the I$ (old values) and then a second partial
> fetch (new values).
> 
> It would be interesting to know what the exact fetch rules are.

So I fixed my test case to do 32-bit writes, and now the results are
making a lot more sense.  Now I only get crashes when writing across
cache lines.  So maybe we should just go with Andy's suggestion above.

It would be great if some CPU people could confirm that it's safe (for
x86-64 only), since it's not in the SDM.  Who can help answer that?

-- 
Josh