linux-kernel - Re: [PATCH] doc: Update wake_up() & co. memory-barrier guarantees

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180625131643.GA15126@andrea>
Date:   Mon, 25 Jun 2018 15:16:43 +0200
From:   Andrea Parri <andrea.parri@...rulasolutions.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     linux-kernel@...r.kernel.org, linux-doc@...r.kernel.org,
        Alan Stern <stern@...land.harvard.edu>,
        Will Deacon <will.deacon@....com>,
        Boqun Feng <boqun.feng@...il.com>,
        Nicholas Piggin <npiggin@...il.com>,
        David Howells <dhowells@...hat.com>,
        Jade Alglave <j.alglave@....ac.uk>,
        Luc Maranget <luc.maranget@...ia.fr>,
        "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
        Akira Yokosawa <akiyks@...il.com>,
        Daniel Lustig <dlustig@...dia.com>,
        Jonathan Corbet <corbet@....net>,
        Ingo Molnar <mingo@...hat.com>,
        Randy Dunlap <rdunlap@...radead.org>
Subject: Re: [PATCH] doc: Update wake_up() & co. memory-barrier guarantees

> > A concrete example being the store-buffering pattern reported in [1].
> 
> Well, that example only needs a store->load barrier. It so happens
> smp_mb() is the only one actually doing that, but imagine we had a
> weaker barrier that did just that, one that did not imply the full
> transitivity smp_mb() does.
> 
> Then the example from [1] could use that weaker thing.

Absolutely (and that would be "fence w,r" on RISC-V, IIUC).


> 
> > > So yes, I suppose we're entirely suck with the full memory barrier
> > > semantics like that. But I still find it easier to think of it like a
> > > RELEASE that pairs with the ACQUIRE of waking up, such that the task
> > > is guaranteed to observe it's own wake condition.
> > > 
> > > And maybe that is the thing I'm missing here. These comments only state
> > > that it does in fact imply a full memory barrier, but do not explain
> > > why, should it?
> > 
> > "code (people) is relying on it" is really the only "why" I can think
> > of.  With this patch, that same/SB pattern is also reported in memory
> > -barriers.txt.  Other ideas?
> 
> So I'm not actually sure how many people rely on the RCsc transitive
> smp_mb() here. People certainly rely on the RELEASE semantics, and the
> code itself requires the store->load ordering, together that gives us
> the smp_mb() because that's simply the only barrier we have.
> 
> And looking at smp_mb__after_spinlock() again, we really only need the
> RCsc thing for rq->lock, not for the wakeups. The wakeups really only
> need that RCpc RELEASE + store->load thing (which we don't have).
> 
> So yes, smp_mb(), however the below still makes more sense to me, or am
> I just being obtuse again?
> 
> ---
>  kernel/sched/core.c | 19 +++++++++++++------
>  1 file changed, 13 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index a98d54cd5535..8374d01b2820 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1879,7 +1879,9 @@ static void ttwu_queue(struct task_struct *p, int cpu, int wake_flags)
>   *  C) LOCK of the rq(c1)->lock scheduling in task
>   *
>   * Transitivity guarantees that B happens after A and C after B.
> - * Note: we only require RCpc transitivity.
> + * Note: we only require RCpc transitivity for these cases,
> + *       but see smp_mb__after_spinlock() for why rq->lock is required
> + *       to be RCsc.
>   * Note: the CPU doing B need not be c0 or c1

FWIW, we discussed this pattern here:

  http://lkml.kernel.org/r/20171018010748.GA4017@andrea


>   *
>   * Example:
> @@ -1944,13 +1946,14 @@ static void ttwu_queue(struct task_struct *p, int cpu, int wake_flags)
>   * However; for wakeups there is a second guarantee we must provide, namely we
>   * must observe the state that lead to our wakeup. That is, not only must our
>   * task observe its own prior state, it must also observe the stores prior to
> - * its wakeup.
> + * its wakeup, see set_current_state().
>   *
>   * This means that any means of doing remote wakeups must order the CPU doing
> - * the wakeup against the CPU the task is going to end up running on. This,
> - * however, is already required for the regular Program-Order guarantee above,
> - * since the waking CPU is the one issueing the ACQUIRE (smp_cond_load_acquire).
> - *
> + * the wakeup against the CPU the task is going to end up running on. This
> + * means two things: firstly that try_to_wake_up() must (at least) imply a
> + * RELEASE (smp_mb__after_spinlock()), and secondly, as is already required
> + * for the regular Program-Order guarantee above, that waking implies an ACQUIRE
> + * (see smp_cond_load_acquire() above).
>   */
>  
>  /**
> @@ -1966,6 +1969,10 @@ static void ttwu_queue(struct task_struct *p, int cpu, int wake_flags)
>   * Atomic against schedule() which would dequeue a task, also see
>   * set_current_state().
>   *
> + * Implies at least a RELEASE such that the waking task is guaranteed to
> + * observe the stores to the wait-condition; see set_task_state() and the
> + * Program-Order constraints.

[s/set_task_task/set_current_state ?]

I'd stick to "Implies/Executes at least a full barrier"; this is in fact
already documented in the function body:

	/*
	 * If we are going to wake up a thread waiting for CONDITION we
	 * need to ensure that CONDITION=1 done by the caller can not be
	 * reordered with p->state check below. This pairs with mb() in
	 * set_current_state() the waiting thread does.
	 */

(this is, again, that "store->load barrier"/SB).

I'll try to integrate these changes in v2, if there is no objection.

  Andrea


> + *
>   * Return: %true if @p->state changes (an actual wakeup was done),
>   *	   %false otherwise.
>   */