linux-kernel - Re: Current mainline git (24e700e291d52bd2) hangs when building e.g. perf

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Sat, 9 Sep 2017 11:02:50 -0700
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Andy Lutomirski <luto@...nel.org>
Cc:     Borislav Petkov <bp@...en8.de>,
        Markus Trippelsdorf <markus@...ppelsdorf.de>,
        Ingo Molnar <mingo@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Ingo Molnar <mingo@...hat.com>,
        Tom Lendacky <thomas.lendacky@....com>
Subject: Re: Current mainline git (24e700e291d52bd2) hangs when building e.g. perf

On Sat, Sep 9, 2017 at 10:49 AM, Andy Lutomirski <luto@...nel.org> wrote:
>
> Anyway, if I need change the behavior back, I can do it in one of two
> ways.  I can just switch to init_mm instead of going lazy, which is
> expensive, but not *that* expensive on CPUs with PCID.  Or I can do it
> the way we used to do it and send the flush IPI to lazy CPUs.  The
> latter will only have a performance impact when a flush happens, but
> the performance hit is much higher when there's a flush.

Why not both?

Let's at least entertain the idea. In particular, we don't send IPI's
to *all* CPU's. We only send them to the set of CPU's that could have
that MM cached.

And that set _may_ be very limited. In the best case, it's just the
current CPU, and no IPI is needed at all.

Which means that maybe we can use that set of CPU's as guidance to how
we should treat lazy.

We can *also* take PCID support into account.

So what I would suggest is something like

 - if we have PCID support, _and_ the set of CPU's is more than just
us, just switch to init_mm. The switch is cheaper than the IPI's.

 - otherwise do what we used to do, with the IPI.

The exact heuristics could be tuned later, but considering Markus's
report, and considering that not so many people have really even
heavily tested the new code yet (so _one_ report now means that there
are probably a shitload of machines that would show it later), I
really think we need to steer back towards our old behavior. But at
the same time, I think we can take advantage of newer CPU's that _do_
have PCID.

Hmm?

                  Linus