linux-kernel - Re: task isolation discussion at Linux Plumbers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALCETrXs6BkR+MhRJh5k9-BWTZOQ1=nBy5ycy+1U02ei+BJRcw@mail.gmail.com>
Date:   Wed, 9 Nov 2016 17:44:02 -0800
From:   Andy Lutomirski <luto@...capital.net>
To:     "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Cc:     Chris Metcalf <cmetcalf@...lanox.com>,
        Gilad Ben Yossef <giladb@...lanox.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ingo Molnar <mingo@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Rik van Riel <riel@...hat.com>, Tejun Heo <tj@...nel.org>,
        Frederic Weisbecker <fweisbec@...il.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Christoph Lameter <cl@...ux.com>,
        Viresh Kumar <viresh.kumar@...aro.org>,
        Catalin Marinas <catalin.marinas@....com>,
        Will Deacon <will.deacon@....com>,
        Daniel Lezcano <daniel.lezcano@...aro.org>,
        Francis Giraldeau <francis.giraldeau@...il.com>,
        Andi Kleen <andi@...stfloor.org>,
        Arnd Bergmann <arnd@...db.de>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: task isolation discussion at Linux Plumbers

On Wed, Nov 9, 2016 at 9:38 AM, Paul E. McKenney
<paulmck@...ux.vnet.ibm.com> wrote:

Are you planning on changing rcu_nmi_enter()?  It would make it easier
to figure out how they interact if I could see the code.

> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index dbf20b058f48..342c8ee402d6 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c


>  /*
> @@ -305,17 +318,22 @@ static void rcu_dynticks_eqs_enter(void)
>  static void rcu_dynticks_eqs_exit(void)
>  {
>         struct rcu_dynticks *rdtp = this_cpu_ptr(&rcu_dynticks);
> +       int seq;
>
>         /*
> -        * CPUs seeing atomic_inc() must see prior idle sojourns,
> +        * CPUs seeing atomic_inc_return() must see prior idle sojourns,
>          * and we also must force ordering with the next RCU read-side
>          * critical section.
>          */
> -       smp_mb__before_atomic(); /* See above. */
> -       atomic_inc(&rdtp->dynticks);
> -       smp_mb__after_atomic(); /* See above. */
> +       seq = atomic_inc_return(&rdtp->dynticks);
>         WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) &&
> -                    !(atomic_read(&rdtp->dynticks) & 0x1));
> +                    !(seq & RCU_DYNTICK_CTRL_CTR));

I think there's still a race here.  Suppose we're running this code on
cpu n and...

> +       if (seq & RCU_DYNTICK_CTRL_MASK) {
> +               rcu_eqs_special_exit();
> +               /* Prefer duplicate flushes to losing a flush. */
> +               smp_mb__before_atomic(); /* NMI safety. */

... another CPU changes the page tables and calls rcu_eqs_special_set(n) here.

That CPU expects that we will flush prior to continuing, but we won't.
Admittedly it's highly unlikely that any stale TLB entries would be
created yet, but nothing rules it out.

> +               atomic_and(~RCU_DYNTICK_CTRL_MASK, &rdtp->dynticks);
> +       }

Maybe the way to handle it is something like:

this_cpu_write(rcu_nmi_needs_eqs_special, 1);
barrier();

/* NMI here will call rcu_eqs_special_exit() regardless of the value
in dynticks */

atomic_and(...);
smp_mb__after_atomic();
rcu_eqs_special_exit();

barrier();
this_cpu_write(rcu_nmi_needs_eqs_special, 0);


Then rcu_nmi_enter() would call rcu_eqs_special_exit() if the dynticks
bit is set *or* rcu_nmi_needs_eqs_special is set.

Does that make sense?

--Andy