lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 25 Jan 2018 06:07:07 -0800
From:   Arjan van de Ven <arjan@...ux.intel.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Tim Chen <tim.c.chen@...ux.intel.com>,
        linux-kernel@...r.kernel.org,
        KarimAllah Ahmed <karahmed@...zon.de>,
        Andi Kleen <ak@...ux.intel.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Andy Lutomirski <luto@...nel.org>,
        Ashok Raj <ashok.raj@...el.com>,
        Asit Mallick <asit.k.mallick@...el.com>,
        Borislav Petkov <bp@...e.de>,
        Dan Williams <dan.j.williams@...el.com>,
        Dave Hansen <dave.hansen@...el.com>,
        David Woodhouse <dwmw@...zon.co.uk>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        "H . Peter Anvin" <hpa@...or.com>, Ingo Molnar <mingo@...hat.com>,
        Janakarajan Natarajan <Janakarajan.Natarajan@....com>,
        Joerg Roedel <joro@...tes.org>,
        Jun Nakajima <jun.nakajima@...el.com>,
        Laura Abbott <labbott@...hat.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Masami Hiramatsu <mhiramat@...nel.org>,
        Paolo Bonzini <pbonzini@...hat.com>, rkrcmar@...hat.com,
        Thomas Gleixner <tglx@...utronix.de>,
        Tom Lendacky <thomas.lendacky@....com>, x86@...nel.org
Subject: Re: [RFC PATCH 1/2] x86/ibpb: Skip IBPB when we switch back to same
 user process

On 1/25/2018 5:50 AM, Peter Zijlstra wrote:
> On Thu, Jan 25, 2018 at 05:21:30AM -0800, Arjan van de Ven wrote:
>>>
>>> This means that 'A -> idle -> A' should never pass through switch_mm to
>>> begin with.
>>>
>>> Please clarify how you think it does.
>>>
>>
>> the idle code does leave_mm() to avoid having to IPI CPUs in deep sleep states
>> for a tlb flush.
> 
> The intel_idle code does, not the idle code. This is squirreled away in
> some driver :/

afaik (but haven't looked in a while) acpi drivers did too
> 
>> (trust me, that you really want, sequentially IPI's a pile of cores in a deep sleep
>> state to just flush a tlb that's empty, the performance of that is horrific)
> 
> Hurmph. I'd rather fix that some other way than leave_mm(), this is
> piling special on special.
> 
the problem was tricky. but of course if something better is possible lets figure this out

problem is that an IPI to an idle cpu is both power inefficient and will take time,
exit of a deep C state can be, say 50 to 100 usec range of time (it varies by many things, but
for abstractly thinking about the problem one should generally round up to nice round numbers)

if you have say 64 cores that had the mm at some point, but 63 are in idle, the 64th
really does not want to IPI each of those 63 serially (technically this is does not need
to be serial but IPI code is tricky, some things end up serializing this a bit)
to get the 100 usec hit 63 times. Actually, even if it's not serialized, even ONE hit of 100 usec
is unpleasant.

so a CPU that goes idle wants to "unsubscribe" itself from those IPIs as general objective.

but not getting flush IPIs is only safe if the TLBs in the CPU have nothing that such IPI would
want to flush, so the TLB needs to be empty of those things.

the only way to do THAT is to switch to an mm that is safe; a leave_mm() does this, but I'm sure other
options exist.

note: While a CPU that is in a deeper C state will itself flush the TLB, you don't know if you will actually
enter that deep at the time of making OS decisions (if an interrupt comes in the cycle before mwait, mwait
becomes a nop for example). In addition, once you wake up, you don't want the CPU to go start filling
the TLBs with invalid data so you can't really just set a bit and flush after leaving idle

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ