[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170922033057.GF10893@tardis>
Date: Fri, 22 Sep 2017 11:30:57 +0800
From: Boqun Feng <boqun.feng@...il.com>
To: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Cc: "Paul E . McKenney" <paulmck@...ux.vnet.ibm.com>,
Peter Zijlstra <peterz@...radead.org>,
linux-kernel@...r.kernel.org, Andrew Hunter <ahh@...gle.com>,
Maged Michael <maged.michael@...il.com>, gromer@...gle.com,
Avi Kivity <avi@...lladb.com>,
Benjamin Herrenschmidt <benh@...nel.crashing.org>,
Paul Mackerras <paulus@...ba.org>,
Michael Ellerman <mpe@...erman.id.au>,
Dave Watson <davejwatson@...com>,
Alan Stern <stern@...land.harvard.edu>,
Will Deacon <will.deacon@....com>,
Andy Lutomirski <luto@...nel.org>, linux-arch@...r.kernel.org
Subject: Re: [RFC PATCH v3 1/2] membarrier: Provide register expedited
private command
On Fri, Sep 22, 2017 at 11:22:06AM +0800, Boqun Feng wrote:
> Hi Mathieu,
>
> On Tue, Sep 19, 2017 at 06:13:41PM -0400, Mathieu Desnoyers wrote:
> > Provide a new command allowing processes to register their intent to use
> > the private expedited command.
> >
> > This allows PowerPC to skip the full memory barrier in switch_mm(), and
> > only issue the barrier when scheduling into a task belonging to a
> > process that has registered to use expedited private.
> >
> > Processes are now required to register before using
> > MEMBARRIER_CMD_PRIVATE_EXPEDITED, otherwise that command returns EPERM.
> >
>
> Sorry I'm late for the party, but I couldn't stop thinking whether we
> could avoid the register thing at all, because the registering makes
> sys_membarrier() more complex(both for the interface and the
> implementation). So how about we trade-off a little bit by taking
> some(not all) the rq->locks?
>
> The idea is in membarrier_private_expedited(), we go through all ->curr
> on each CPU and
>
> 1) If it's a userspace task and its ->mm is matched, we send an ipi
>
> 2) If it's a kernel task, we skip
>
> (Because there will be a smp_mb() implied by mmdrop(), when it
> switchs to userspace task).
>
> 3) If it's a userspace task and its ->mm is not matched, we take
> the corresponding rq->lock and check rq->curr again, if its ->mm
> matched, we send an ipi, otherwise we do nothing.
>
> (Because if we observe rq->curr is not matched with rq->lock
> held, when a task having matched ->mm schedules in, the rq->lock
> pairing along with the smp_mb__after_spinlock() will guarantee
> it observes all memory ops before sys_membarrir()).
>
> membarrier_private_expedited() will look like this if we choose this
> way:
>
> void membarrier_private_expedited()
> {
> int cpu;
> bool fallback = false;
> cpumask_var_t tmpmask;
> struct rq_flags rf;
>
>
> if (num_online_cpus() == 1)
> return;
>
> smp_mb();
>
> if (!zalloc_cpumask_var(&tmpmask, GFP_NOWAIT)) {
> /* Fallback for OOM. */
> fallback = true;
> }
>
> cpus_read_lock();
> for_each_online_cpu(cpu) {
> struct task_struct *p;
>
> if (cpu == raw_smp_processor_id())
> continue;
>
> rcu_read_lock();
> p = task_rcu_dereference(&cpu_rq(cpu)->curr);
>
> if (!p) {
> rcu_read_unlock();
> continue;
> }
>
> if (p->mm == current->mm) {
> if (!fallback)
> __cpumask_set_cpu(cpu, tmpmask);
> else
> smp_call_function_single(cpu, ipi_mb, NULL, 1);
> }
>
> if (p->mm == current->mm || !p->mm) {
> rcu_read_unlock();
> continue;
> }
>
> rcu_read_unlock();
>
> /*
> * This should be a arch-specific code, as we don't
> * need it at else place other than some archs without
> * a smp_mb() in switch_mm() (i.e. powerpc)
> */
> rq_lock_irq(cpu_rq(cpu), &rf);
> if (p->mm == current->mm) {
Oops, this one should be
if (cpu_curr(cpu)->mm == current->mm)
> if (!fallback)
> __cpumask_set_cpu(cpu, tmpmask);
> else
> smp_call_function_single(cpu, ipi_mb, NULL, 1);
, and this better be moved out of the lock rq->lock critical section.
Regards,
Boqun
> }
> rq_unlock_irq(cpu_rq(cpu), &rf);
> }
> if (!fallback) {
> smp_call_function_many(tmpmask, ipi_mb, NULL, 1);
> free_cpumask_var(tmpmask);
> }
> cpus_read_unlock();
>
> smp_mb();
> }
>
> Thoughts?
>
> Regards,
> Boqun
>
[...]
Download attachment "signature.asc" of type "application/pgp-signature" (489 bytes)
Powered by blists - more mailing lists