lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20200422115659.GF20730@hirez.programming.kicks-ass.net>
Date:   Wed, 22 Apr 2020 13:56:59 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Ingo Molnar <mingo@...nel.org>
Cc:     Josh Poimboeuf <jpoimboe@...hat.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Masahiro Yamada <masahiroy@...nel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        the arch/x86 maintainers <x86@...nel.org>
Subject: Re: [GIT pull] perf/urgent for 5.7-rc2

On Wed, Apr 22, 2020 at 09:45:12AM +0200, Ingo Molnar wrote:
> 
> * Peter Zijlstra <peterz@...radead.org> wrote:
> 
> > On Mon, Apr 20, 2020 at 09:48:45AM +0200, Ingo Molnar wrote:
> > > Fortunately, much of what objtool does against vmlinux.o can be 
> > > parallelized in a rather straightforward fashion I believe, if we build 
> > > with -ffunction-sections.
> > 
> > So that FGKASLR is going to get us -ffunction-sections, but
> > parallelizing objtool isn't going to be trivial, it's data structures
> > aren't really build for that, esp. decode_instructions() which actively
> > generates data.
> > 
> > Still, it's probably doable.
> 
> So AFAICS in the narrow code section I identified as the main overhead, 
> only the instruction hash needs threading, i.e. this step:
> 
>                         hash_add(file->insn_hash, &insn->hash, insn->offset);
>                         list_add_tail(&insn->list, &file->insn_list);
> 
> Objtool can still be single-threaded before and after generating the 
> instruction hash.
> 
> 99% of the overhead within decode_instructions() is in 
> arch_decode_instruction(), which is fully thread-safe AFAICS.

Correct; I suppose you can farm out the sections to N threads for
arch_decode_instruction() and then have the main thread collect decoded
sections and frob them in the global data structures.

Another pass you can probably parallize fairly easily is
validate_functions() / validate_unwind_hints(). While that modifies
state, the state it modifies should be local to the section at hand.

That needs an audit of course, but it should be entirely doable.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ