lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 21 Jul 2021 13:41:46 -0700
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     "Paul E. McKenney" <paulmck@...nel.org>
Cc:     rcu@...r.kernel.org,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Kernel Team <kernel-team@...com>,
        Ingo Molnar <mingo@...nel.org>,
        Lai Jiangshan <jiangshanlai@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Josh Triplett <josh@...htriplett.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        David Howells <dhowells@...hat.com>,
        Eric Dumazet <edumazet@...gle.com>,
        Frédéric Weisbecker <fweisbec@...il.com>,
        Oleg Nesterov <oleg@...hat.com>,
        Joel Fernandes <joel@...lfernandes.org>
Subject: Re: [PATCH rcu 04/18] rcu: Weaken ->dynticks accesses and updates

Hmm.

This actually seems to make some of the ordering worse.

I'm not seeing a lot of weakening or optimization, but it depends a
bit on what is common and what is not.

On Wed, Jul 21, 2021 at 1:21 PM Paul E. McKenney <paulmck@...nel.org> wrote:
>
> +/*
> + * Increment the current CPU's rcu_data structure's ->dynticks field
> + * with ordering.  Return the new value.
> + */
> +static noinstr unsigned long rcu_dynticks_inc(int incby)
> +{
> +       struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
> +       int seq;
> +
> +       seq = READ_ONCE(rdp->dynticks) + incby;
> +       smp_store_release(&rdp->dynticks, seq);
> +       smp_mb();  // Fundamental RCU ordering guarantee.
> +       return seq;
> +}

So this is actually likely *more* expensive than the old code was, at
least on x86.

The READ_ONCE/smp_store_release are cheap, but then the smp_mb() is expensive.

The old code did just arch_atomic_inc_return(), which included the
memory barrier.

There *might* be some cache ordering advantage to letting the
READ_ONCE() float upwards, but from a pure barrier standpoint this is
more expensive than what we used to have.

> -       if (atomic_read(&rdp->dynticks) & 0x1)
> +       if (READ_ONCE(rdp->dynticks) & 0x1)
>                 return;
> -       atomic_inc(&rdp->dynticks);
> +       rcu_dynticks_inc(1);

And this one seems to not take advantage of the new rule, so we end up
having two reads, and then that potentially more expensive sequence.

>  static int rcu_dynticks_snap(struct rcu_data *rdp)
>  {
> -       return atomic_add_return(0, &rdp->dynticks);
> +       smp_mb();  // Fundamental RCU ordering guarantee.
> +       return smp_load_acquire(&rdp->dynticks);
>  }

This is likely cheaper - not because of barriers, but simply because
it avoids dirtying the cacheline.

So which operation do we _care_ about, and do we have numbers for why
this improves anything? Because looking at the patch, it's not obvious
that this is an improvement.

              Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ