linux-kernel - Re: [PATCH RFC v6] x86,mm,sched: make lazy TLB mode even lazier

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALCETrXAppLAsMj4k8LbhnQYsBJ0ui2Yzvp1R5WsZbaSnSS2Ug@mail.gmail.com>
Date:   Thu, 8 Sep 2016 21:39:45 -0700
From:   Andy Lutomirski <luto@...capital.net>
To:     Benjamin Serebrin <serebrin@...gle.com>
Cc:     Ingo Molnar <mingo@...nel.org>, Rik van Riel <riel@...hat.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        "H. Peter Anvin" <hpa@...or.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Andrew Lutomirski <luto@...nel.org>,
        Borislav Petkov <bp@...e.de>, Mel Gorman <mgorman@...e.de>,
        Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH RFC v6] x86,mm,sched: make lazy TLB mode even lazier

On Thu, Sep 8, 2016 at 5:09 PM, Benjamin Serebrin <serebrin@...gle.com> wrote:
> Sorry for the delay, I was eaten by a grue.
>
> I found that my initial study did not actually measure the number of
> TLB shootdown IPIs sent per TLB shootdown.  I think the intuition was
> correct but I didn't actually observe what I thought I had; my
> original use of probe points was incorrect.  However, after fixing my
> methodology, I'm having trouble proving that the existing Lazy TLB
> mode is working properly.
>
>
>
> I've spent some time trying to reproduce this in a microbenchmark.
> One thread does mmap, touch page, munmap, while other threads in the
> same process are configured to either busy-spin or busy-spin and
> yield.  All threads set their own affinity to a unique cpu, and the
> system is otherwise idle.  I look at the per-cpu delta of the TLB and
> CAL lines of /proc/interrupts over the run of the microbenchmark.
>
> Let's say I have 4 spin threads that never yield.  The mmap thread
> does N unmaps.  I observe each spin-thread core receives N (+/-  small
> noise) TLB shootdown interrupts, and the total TLB interrupt count is
> 4N (+/- small noise).  This is expected behavior.
>
> Then I add some synchronization:  the unmap thread rendezvouses with
> all the spinners, and when they are all ready, the spinners busy-spin
> for D milliseconds and then yield (pthread_yield, sched_yield produce
> identical results, though I'm not confident here that this is the
> right yield).  Meanwhile, the unmap thread busy-spins for D+E
> milliseconds and then does M map/touch/unmaps.  (D, E are single-digit
> milliseconds).  The idea here is that the unmap happens a little while
> after the spinners yielded; the kernel should be in the user process'
> mm but lazy TLB mode should defer TLB flushes.  It seems that lazy
> mode on each CPU should take 1 interrupt and then suppress subsequent
> interrupts.

If they're busy threads, shouldn't the yield return immediately
because the threads are still ready to run?  Lazy TLB won't do much
unless you get the kernel in some state where it's running in the
context of a different kernel thread and hasn't switched to
swapper_pg_dir.  IIRC idle works like that, but you'd need to actually
sleep to go idle.

--Andy