linux-kernel - Re: [RFC PATCH v2] membarrier: expedited private command

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <3db6b20b-df76-8284-5bc1-37a511ee0534@scylladb.com>
Date:   Tue, 1 Aug 2017 13:32:43 +0300
From:   Avi Kivity <avi@...lladb.com>
To:     Peter Zijlstra <peterz@...radead.org>,
        Nicholas Piggin <npiggin@...il.com>
Cc:     Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Michael Ellerman <mpe@...erman.id.au>,
        "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        Boqun Feng <boqun.feng@...il.com>,
        Andrew Hunter <ahh@...gle.com>,
        maged michael <maged.michael@...il.com>,
        gromer <gromer@...gle.com>,
        Benjamin Herrenschmidt <benh@...nel.crashing.org>,
        Palmer Dabbelt <palmer@...belt.com>,
        Dave Watson <davejwatson@...com>
Subject: Re: [RFC PATCH v2] membarrier: expedited private command



On 08/01/2017 01:22 PM, Peter Zijlstra wrote:
>
>> If mm cpumask is used, I think it's okay. You can cause quite similar
>> kind of iteration over CPUs and lots of IPIs, tlb flushes, etc using
>> munmap/mprotect/etc, or context switch IPIs, etc. Are we reaching the
>> stage where we're controlling those kinds of ops in terms of impact
>> to the rest of the system?
> So x86 has a tight mm_cpumask(), we only broadcast TLB invalidate IPIs
> to those CPUs actually running threads of our process (or very
> recently). So while there can be the sporadic stray IPI for a CPU that
> recently ran a thread of the target process, it will not get another one
> until it switches back into the process.
>
> On machines that need manual TLB broadcasts and don't keep a tight mask,
> yes you can interfere at will, but if they care they can fix by
> tightening the mask.
>
> In either case, the mm_cpumask() will be bounded by the set of CPUs the
> threads are allowed to run on and will not interfere with the rest of
> the system.
>
> As to scheduler IPIs, those are limited to the CPUs the user is limited
> to and are rate limited by the wakeup-latency of the tasks. After all,
> all the time a task is runnable but not running, wakeups are no-ops.
>
> Trouble is of course, that not everybody even sets a single bit in
> mm_cpumask() and those that never clear bits will end up with a fairly
> wide mask, still interfering with work that isn't hard partitioned.

I hate to propose a way to make this more complicated, but this could be 
fixed by a process first declaring its intent to use expedited 
process-wide membarrier; if it does, then every context switch updates a 
process-wide cpumask indicating which cpus are currently running threads 
of that process:

   if (prev->mm != next->mm)
       if (prev->mm->running_cpumask)
              cpumask_clear(...);
       else if (next->mm->running_cpumask)
              cpumask_set(...);

now only processes that want expedited process-wide membarrier pay for 
it (in other than some predictable branches). You can even have threads 
opt-in, so unrelated threads that don't participate in the party don't 
cause those bits to be set.