lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 29 Nov 2018 13:55:36 -0500
From:   Steven Rostedt <rostedt@...dmis.org>
To:     Andy Lutomirski <luto@...capital.net>
Cc:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Josh Poimboeuf <jpoimboe@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Andrew Lutomirski <luto@...nel.org>,
        the arch/x86 maintainers <x86@...nel.org>,
        Linux List Kernel Mailing <linux-kernel@...r.kernel.org>,
        Ard Biesheuvel <ard.biesheuvel@...aro.org>,
        Ingo Molnar <mingo@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>, mhiramat@...nel.org,
        jbaron@...mai.com, Jiri Kosina <jkosina@...e.cz>,
        David.Laight@...lab.com, bp@...en8.de, julia@...com,
        jeyu@...nel.org, Peter Anvin <hpa@...or.com>
Subject: Re: [PATCH v2 4/4] x86/static_call: Add inline static call
 implementation for x86-64

On Thu, 29 Nov 2018 10:00:48 -0800
Andy Lutomirski <luto@...capital.net> wrote:

> > 
> > Of course, another option is to just say "we don't do the inline case,
> > then", and only ever do a call to a stub that does a "jmp"
> > instruction.  
> 
> That’s not a terrible idea.

It was the implementation of my first proof of concept that kicked off
this entire idea, where others (Peter and Josh) thought it was better
to modify the calls themselves. It does improve things.

Just a reminder of the benchmarks of enabling all tracepoints (which
use indirect jumps) and running hackbench:

  No RETPOLINES:
            1.4503 +- 0.0148 seconds time elapsed  ( +-  1.02% )

  baseline RETPOLINES:
            1.5120 +- 0.0133 seconds time elapsed  ( +-  0.88% )

  Added direct calls for trace_events:
            1.5239 +- 0.0139 seconds time elapsed  ( +-  0.91% )

  With static calls:
            1.5282 +- 0.0135 seconds time elapsed  ( +-  0.88% )

  With static call trampolines:
           1.48328 +- 0.00515 seconds time elapsed  ( +-  0.35% )

  Full static calls:
           1.47364 +- 0.00706 seconds time elapsed  ( +-  0.48% )


  Adding Retpolines caused a 1.5120 / 1.4503 = 1.0425 ( 4.25% ) slowdown

  Trampolines made it into 1.48328 / 1.4503 = 1.0227 ( 2.27% ) slowdown

The above is the stub with the jmp case.

  With full static calls 1.47364 / 1.4503 = 1.0160 ( 1.6% ) slowdown

Modifying the calls themselves does have an improvement (and this is
much greater of an improvement when I had debugging enabled).

Perhaps it's not worth the effort, but again, we do have control of
what uses this. It's not a total free-for-all.

Full results here:

  http://lkml.kernel.org/r/20181126155405.72b4f718@gandalf.local.home

Although since lore.kernel.org seems to be having issues:

  https://marc.info/?l=linux-kernel&m=154326714710686


-- Steve

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ