lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 2 Dec 2016 09:38:38 -0800
From:   Andy Lutomirski <luto@...nel.org>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Andy Lutomirski <luto@...nel.org>, Peter Anvin <hpa@...or.com>,
        "the arch/x86 maintainers" <x86@...nel.org>,
        One Thousand Gnomes <gnomes@...rguk.ukuu.org.uk>,
        Borislav Petkov <bp@...en8.de>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Brian Gerst <brgerst@...il.com>,
        Matthew Whitehead <tedheadster@...il.com>,
        Henrique de Moraes Holschuh <hmh@....eng.br>,
        Peter Zijlstra <peterz@...radead.org>,
        Andrew Cooper <andrew.cooper3@...rix.com>
Subject: Re: [PATCH v2 5/6] x86/xen: Add a Xen-specific sync_core() implementation

On Fri, Dec 2, 2016 at 9:32 AM, Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
> On Thu, Dec 1, 2016 at 4:35 PM, Andy Lutomirski <luto@...nel.org> wrote:
>>
>> On my laptop, CPUID(eax=1, ecx=0) is ~83ns and IRET-to-self is
>> ~110ns.  But Xen PV will trap CPUID if possible, so IRET-to-self
>> should end up being a nice speedup.
>
> So if we care deeply about the performance of this, we should really
> ask ourselves how much we need this...
>
> There are *very* few places where we really need to do a full
> serializing instruction, and I'd worry that we really don't need it in
> many of the places we do this.
>
> The only real case I'm aware of is modifying code that is modified
> through a different linear address than it's executed.

TBH, I didn't start down this path for performance.  I did it because
I wanted to kill off a CPUID that was breaking on old CPUs that don't
have CPUID.  So I propose MOV-to-CR2 followed by an unconditional
jump.  My goal here is to make the #*!& thing work reliably and not be
ludicrously slow.  Borislav and I mulled over using an alternative to
use CPUID if and only if we have CPUID, but that doesn't work because
we call sync_core() before we're done applying alternatives.

>
> Is there anything else where we _really_ need this sync-core thing?
> Sure, the microcode loader looks fine, but that doesn't look
> particularly performance-critical either.
>
> So I'd like to know which sync_core is actually so
> performance-critical that w e care about it, and then I'd like to
> understand why it's needed at all, because I suspect a number of them
> has been added with the model of "sprinkle random things around and
> hope".

apply_alternatives, unfortunately.  It's performance-critical because
it's intensely stupid and does sync_core() for every single patch.
Fixing that would be nice, too.

> Adding Peter Anvin to the participants list, because iirc he was the
> one who really talked to hardwre engineers about the synchronization
> issues with serializing kernel code.
>
>                 Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ