lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrWMouz7VzYgwC=Ys3-C2O7ZVCn4FNwesH1np4c9iG1g_A@mail.gmail.com>
Date:   Fri, 1 Jun 2018 14:21:58 -0700
From:   Andy Lutomirski <luto@...nel.org>
To:     riel@...riel.com
Cc:     Andrew Lutomirski <luto@...nel.org>,
        Mike Galbraith <efault@....de>,
        LKML <linux-kernel@...r.kernel.org>, songliubraving@...com,
        kernel-team <kernel-team@...com>, Ingo Molnar <mingo@...hat.com>,
        Thomas Gleixner <tglx@...utronix.de>, X86 ML <x86@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH] x86,switch_mm: skip atomic operations for init_mm

On Fri, Jun 1, 2018 at 1:35 PM Rik van Riel <riel@...riel.com> wrote:
>
> On Fri, 2018-06-01 at 13:03 -0700, Andy Lutomirski wrote:
> > Mike, you never did say: do you have PCID on your CPU?  Also, what is
> > your workload doing to cause so many switches back and forth between
> > init_mm and a task.
> >
> > The point of the optimization is that switching to init_mm() should
> > be
> > fairly fast on a PCID system, whereas an IPI to do the deferred flush
> > is very expensive regardless of PCID.
>
> While I am sure that bit is true, Song and I
> observed about 4x as much CPU use in the atomic
> operations in cpumask_clear_cpu and cpumask_set_cpu
> (inside switch_mm_irqs_off) as we saw CPU used
> in the %cr3 reload itself.
>
> Given how expensive those cpumask updates are,
> lazy TLB mode might always be worth it, especially
> on larger systems.
>

Hmm.  I wonder if there's a more clever data structure than a bitmap
that we could be using here.  Each CPU only ever needs to be in one
mm's cpumask, and each cpu only ever changes its own state in the
bitmask.  And writes are much less common than reads for most
workloads.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ