[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3db6b20b-df76-8284-5bc1-37a511ee0534@scylladb.com>
Date: Tue, 1 Aug 2017 13:32:43 +0300
From: Avi Kivity <avi@...lladb.com>
To: Peter Zijlstra <peterz@...radead.org>,
Nicholas Piggin <npiggin@...il.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Michael Ellerman <mpe@...erman.id.au>,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
linux-kernel <linux-kernel@...r.kernel.org>,
Boqun Feng <boqun.feng@...il.com>,
Andrew Hunter <ahh@...gle.com>,
maged michael <maged.michael@...il.com>,
gromer <gromer@...gle.com>,
Benjamin Herrenschmidt <benh@...nel.crashing.org>,
Palmer Dabbelt <palmer@...belt.com>,
Dave Watson <davejwatson@...com>
Subject: Re: [RFC PATCH v2] membarrier: expedited private command
On 08/01/2017 01:22 PM, Peter Zijlstra wrote:
>
>> If mm cpumask is used, I think it's okay. You can cause quite similar
>> kind of iteration over CPUs and lots of IPIs, tlb flushes, etc using
>> munmap/mprotect/etc, or context switch IPIs, etc. Are we reaching the
>> stage where we're controlling those kinds of ops in terms of impact
>> to the rest of the system?
> So x86 has a tight mm_cpumask(), we only broadcast TLB invalidate IPIs
> to those CPUs actually running threads of our process (or very
> recently). So while there can be the sporadic stray IPI for a CPU that
> recently ran a thread of the target process, it will not get another one
> until it switches back into the process.
>
> On machines that need manual TLB broadcasts and don't keep a tight mask,
> yes you can interfere at will, but if they care they can fix by
> tightening the mask.
>
> In either case, the mm_cpumask() will be bounded by the set of CPUs the
> threads are allowed to run on and will not interfere with the rest of
> the system.
>
> As to scheduler IPIs, those are limited to the CPUs the user is limited
> to and are rate limited by the wakeup-latency of the tasks. After all,
> all the time a task is runnable but not running, wakeups are no-ops.
>
> Trouble is of course, that not everybody even sets a single bit in
> mm_cpumask() and those that never clear bits will end up with a fairly
> wide mask, still interfering with work that isn't hard partitioned.
I hate to propose a way to make this more complicated, but this could be
fixed by a process first declaring its intent to use expedited
process-wide membarrier; if it does, then every context switch updates a
process-wide cpumask indicating which cpus are currently running threads
of that process:
if (prev->mm != next->mm)
if (prev->mm->running_cpumask)
cpumask_clear(...);
else if (next->mm->running_cpumask)
cpumask_set(...);
now only processes that want expedited process-wide membarrier pay for
it (in other than some predictable branches). You can even have threads
opt-in, so unrelated threads that don't participate in the party don't
cause those bits to be set.
Powered by blists - more mailing lists