lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADroS=6Xs9g5k+uiUtC_K=GbfGTMwtn2UGrj+d57WD+kkNNidQ@mail.gmail.com>
Date:   Fri, 28 Jul 2017 10:15:49 -0700
From:   Andrew Hunter <ahh@...gle.com>
To:     "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Cc:     Avi Kivity <avi@...lladb.com>,
        Maged Michael <maged.michael@...il.com>,
        Geoffrey Romer <gromer@...gle.com>,
        lkml <linux-kernel@...r.kernel.org>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Subject: Re: Udpated sys_membarrier() speedup patch, FYI

On Thu, Jul 27, 2017 at 12:43 PM, Paul E. McKenney
<paulmck@...ux.vnet.ibm.com> wrote:
> On Thu, Jul 27, 2017 at 10:20:14PM +0300, Avi Kivity wrote:
>> IPIing only running threads of my process would be perfect. In fact
>> I might even be able to make use of "membarrier these threads
>> please" to reduce IPIs, when I change the topology from fully
>> connected to something more sparse, on larger machines.
>>

We do this as well--sometimes we only need RSEQ fences against
specific CPU(s), and thus pass a subset.

> +static void membarrier_private_expedited_ipi_each(void)
> +{
> +       int cpu;
> +
> +       for_each_online_cpu(cpu) {
> +               struct task_struct *p;
> +
> +               rcu_read_lock();
> +               p = task_rcu_dereference(&cpu_rq(cpu)->curr);
> +               if (p && p->mm == current->mm)
> +                       smp_call_function_single(cpu, ipi_mb, NULL, 1);
> +               rcu_read_unlock();
> +       }
> +}
> +

We have the (simpler imho)

const struct cpumask *mask = mm_cpumask(mm);
/* possibly AND it with a user requested mask */
smp_call_function_many(mask, ipi_func, ....);

which I think will be faster on some archs (that support broadcast)
and have fewer problems with out of sync values (though we do have to
check in our IPI function that we haven't context switched out.

Am I missing why this won't work?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ