lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 01 Jun 2018 15:43:27 -0400
From:   Rik van Riel <riel@...riel.com>
To:     Mike Galbraith <efault@....de>, Andy Lutomirski <luto@...nel.org>
Cc:     LKML <linux-kernel@...r.kernel.org>, songliubraving@...com,
        kernel-team <kernel-team@...com>, Ingo Molnar <mingo@...hat.com>,
        Thomas Gleixner <tglx@...utronix.de>, X86 ML <x86@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH] x86,switch_mm: skip atomic operations for init_mm

On Fri, 2018-06-01 at 20:48 +0200, Mike Galbraith wrote:
> On Fri, 2018-06-01 at 14:22 -0400, Rik van Riel wrote:
> > On Fri, 2018-06-01 at 08:11 -0700, Andy Lutomirski wrote:
> > > On Fri, Jun 1, 2018 at 5:28 AM Rik van Riel <riel@...riel.com>
> > > wrote:
> > > > 
> > > > Song noticed switch_mm_irqs_off taking a lot of CPU time in
> > > > recent
> > > > kernels,using 2.4% of a 48 CPU system during a netperf to
> > > > localhost
> > > > run.
> > > > Digging into the profile, we noticed that cpumask_clear_cpu and
> > > > cpumask_set_cpu together take about half of the CPU time taken
> > > > by
> > > > switch_mm_irqs_off.
> > > > 
> > > > However, the CPUs running netperf end up switching back and
> > > > forth
> > > > between netperf and the idle task, which does not require
> > > > changes
> > > > to the mm_cpumask. Furthermore, the init_mm cpumask ends up
> > > > being
> > > > the most heavily contended one in the system.`
> > > > 
> > > > Skipping cpumask_clear_cpu and cpumask_set_cpu for init_mm
> > > > (mostly the idle task) reduced CPU use of switch_mm_irqs_off
> > > > from 2.4% of the CPU to 1.9% of the CPU, with the following
> > > > netperf commandline:
> > > 
> > > I'm conceptually fine with this change.  Does
> > > mm_cpumask(&init_mm)
> > > end
> > > up in a deterministic state?
> > 
> > Given that we do not touch mm_cpumask(&init_mm)
> > any more, and that bitmask never appears to be
> > used for things like tlb shootdowns (kernel TLB
> > shootdowns simply go to everybody), I suspect
> > it ends up in whatever state it is initialized
> > to on startup.
> > 
> > I had not looked into this much, because it does
> > not appear to be used for anything.
> > 
> > > Mike, depending on exactly what's going on with your benchmark,
> > > this
> > > might help recover a bit of your performance, too.
> > 
> > It will be interesting to know how this change
> > impacts others.
> 
> previous pipe-test numbers
> 4.13.16         2.024978 usecs/loop -- avg 2.045250 977.9 KHz
> 4.14.47         2.234518 usecs/loop -- avg 2.227716 897.8 KHz
> 4.15.18         2.287815 usecs/loop -- avg 2.295858 871.1 KHz
> 4.16.13         2.286036 usecs/loop -- avg 2.279057 877.6 KHz
> 4.17.0.g88a8676 2.288231 usecs/loop -- avg 2.288917 873.8 KHz
> 
> new numbers
> 4.17.0.g0512e01 2.268629 usecs/loop -- avg 2.269493 881.3 KHz
> 4.17.0.g0512e01 2.035401 usecs/loop -- avg 2.038341 981.2 KHz +andy
> 4.17.0.g0512e01 2.238701 usecs/loop -- avg 2.231828 896.1 KHz
> -andy+rik
> 
> There might be something there with your change Rik, but it's small
> enough to be wary of variance.  Andy's "invert the return of
> tlb_defer_switch_to_init_mm()" is OTOH pretty clear.

If inverting the return value of that function helps
some systems, chances are the other value might help
other systems.

That makes you wonder whether it might make sense
to always switch to lazy TLB mode, and only call
switch_mm at TLB flush time, regardless of whether
the CPU supports PCID...

-- 
All Rights Reversed.
Download attachment "signature.asc" of type "application/pgp-signature" (489 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ