lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 10 Jan 2019 17:32:08 +0000
From:   Nadav Amit <namit@...are.com>
To:     Josh Poimboeuf <jpoimboe@...hat.com>
CC:     X86 ML <x86@...nel.org>, LKML <linux-kernel@...r.kernel.org>,
        Ard Biesheuvel <ard.biesheuvel@...aro.org>,
        Andy Lutomirski <luto@...nel.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Masami Hiramatsu <mhiramat@...nel.org>,
        Jason Baron <jbaron@...mai.com>, Jiri Kosina <jkosina@...e.cz>,
        David Laight <David.Laight@...LAB.COM>,
        Borislav Petkov <bp@...en8.de>,
        Julia Cartwright <julia@...com>, Jessica Yu <jeyu@...nel.org>,
        "H. Peter Anvin" <hpa@...or.com>,
        Rasmus Villemoes <linux@...musvillemoes.dk>,
        Edward Cree <ecree@...arflare.com>,
        Daniel Bristot de Oliveira <bristot@...hat.com>
Subject: Re: [PATCH v3 0/6] Static calls

> On Jan 10, 2019, at 8:44 AM, Josh Poimboeuf <jpoimboe@...hat.com> wrote:
> 
> On Thu, Jan 10, 2019 at 01:21:00AM +0000, Nadav Amit wrote:
>>> On Jan 9, 2019, at 2:59 PM, Josh Poimboeuf <jpoimboe@...hat.com> wrote:
>>> 
>>> With this version, I stopped trying to use text_poke_bp(), and instead
>>> went with a different approach: if the call site destination doesn't
>>> cross a cacheline boundary, just do an atomic write.  Otherwise, keep
>>> using the trampoline indefinitely.
>>> 
>>> NOTE: At least experimentally, the call destination writes seem to be
>>> atomic with respect to instruction fetching.  On Nehalem I can easily
>>> trigger crashes when writing a call destination across cachelines while
>>> reading the instruction on other CPU; but I get no such crashes when
>>> respecting cacheline boundaries.
>>> 
>>> BUT, the SDM doesn't document this approach, so it would be great if any
>>> CPU people can confirm that it's safe!
>> 
>> I (still) think that having a compiler plugin can make things much cleaner
>> (as done in [1]). The callers would not need to be changed, and the key can
>> be provided through an attribute.
>> 
>> Using a plugin should also allow to use Steven’s proposal for doing
>> text_poke() safely: by changing 'func()' into 'asm (“call func”)', as done
>> by the plugin, you can be guaranteed that registers are clobbered. Then, you
>> can store in the assembly block the return address in one of these
>> registers.
> 
> I'm no GCC expert (why do I find myself saying this a lot lately?), but
> this sounds to me like it could be tricky to get right.
> 
> I suppose you'd have to do it in an early pass, to allow GCC to clobber
> the registers in a later pass.  So it would necessarily have side
> effects, but I don't know what the risks are.

I’m not GCC expert either and writing this code was not making me full of
joy, etc.. I’ll be happy that my code would be reviewed, but it does work. I
don’t think an early pass is needed, as long as hardware registers were not
allocated.

> Would it work with more than 5 arguments, where args get passed on the
> stack?

It does.

> 
> At the very least, it would (at least partially) defeat the point of the
> callee-saved paravirt ops.

Actually, I think you can even deal with callee-saved functions and remove
all the (terrible) macros. You would need to tell the extension not to
clobber the registers through a new attribute.

> What if we just used a plugin in a simpler fashion -- to do call site
> alignment, if necessary, to ensure the instruction doesn't cross
> cacheline boundaries.  This could be done in a later pass, with no side
> effects other than code layout.  And it would allow us to avoid
> breakpoints altogether -- again, assuming somebody can verify that
> intra-cacheline call destination writes are atomic with respect to
> instruction decoder reads.

The plugin should not be able to do so. Layout of the bytecode is done by
the assembler, so I don’t think a plugin would help you with this one.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ