lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 10 Oct 2019 13:42:44 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Manfred Spraul <manfred@...orfullife.com>
Cc:     Waiman Long <longman@...hat.com>,
        Davidlohr Bueso <dave@...olabs.net>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        1vier1@....de, "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Subject: Re: wake_q memory ordering

On Thu, Oct 10, 2019 at 12:41:11PM +0200, Manfred Spraul wrote:
> Hi,
> 
> Waiman Long noticed that the memory barriers in sem_lock() are not really
> documented, and while adding documentation, I ended up with one case where
> I'm not certain about the wake_q code:
> 
> Questions:
> - Does smp_mb__before_atomic() + a (failed) cmpxchg_relaxed provide an
>   ordering guarantee?

Yep. Either the atomic instruction implies ordering (eg. x86 LOCK
prefix) or it doesn't (most RISC LL/SC), if it does,
smp_mb__{before,after}_atomic() are a NO-OP and the ordering is
unconditinoal, if it does not, then smp_mb__{before,after}_atomic() are
unconditional barriers.

IOW, the only way to get a cmpxchg without barriers on failure, is with
LL/SC, and in that case smp_mb__{before,after}_atomic() are
unconditional.

For instance, the way ARM64 does cmpxchg() is:

cmpxchg(p, o, n)
	do {
		v = LL(p);
		if (v != o)
			return v;
	} while (!SC_RELEASE(p, n))
	smp_mb();
	return v;

And you'll note how on success the store-release constraints all prior
memory operations, and the smp_mb() constraints all later memops. But on
failure there's not a barrier to be found.

> - Is it ok that wake_up_q just writes wake_q->next, shouldn't
>   smp_store_acquire() be used? I.e.: guarantee that wake_up_process()
>   happens after cmpxchg_relaxed(), assuming that a failed cmpxchg_relaxed
>   provides any ordering.

There is no such thing as store_acquire, it is either load_acquire or
store_release. But just like how we can write load-aquire like
load+smp_mb(), so too I suppose we could write store-acquire like
store+smp_mb(), and that is exactly what is there (through the implied
barrier of wake_up_process()).

(arguably it should've been WRITE_ONCE() I suppose)

> 
> Example:
> - CPU2 never touches lock a. It is just an unrelated wake_q user that also
>   wants to wake up task 1234.
> - I've noticed already that smp_store_acquire() doesn't exist.
>   So smp_store_mb() is required. But from semantical point of view, we would
>   need an ACQUIRE: the wake_up_process() must happen after cmpxchg().
> - May wake_up_q() rely on the spinlocks/memory barriers in try_to_wake_up,
>   or should the function be safe by itself?
> 
> CPU1: /current=1234, inside do_semtimedop()/
>         g_wakee = current;
>         current->state = TASK_INTERRUPTIBLE;
>         spin_unlock(a);
> 
> CPU2: / arbitrary kernel thread that uses wake_q /
>                 wake_q_add(&unrelated_q, 1234);
>                 wake_up_q(&unrelated_q);
>                 <...ongoing>
> 
> CPU3: / do_semtimedop() + wake_up_sem_queue_prepare() /
>                         spin_lock(a);
>                         wake_q_add(,g_wakee);
>                         < within wake_q_add() >:
>                           smp_mb__before_atomic();
>                           if (unlikely(cmpxchg_relaxed(&node->next, NULL,
> WAKE_Q_TAIL)))
>                               return false; /* -> this happens */
> 
> CPU2:
>                 <within wake_up_q>
>                 1234->wake_q.next = NULL; <<<<<<<<< Ok? Is store_acquire()
> missing? >>>>>>>>>>>>

		/* smp_mb(); implied by the following wake_up_process() */

>                 wake_up_process(1234);
>                 < within wake_up_process/try_to_wake_up():
>                     raw_spin_lock_irqsave()
>                     smp_mb__after_spinlock()
>                     if(1234->state = TASK_RUNNING) return;
>                  >
> 
> 
> rewritten:
> 
> start condition: A = 1; B = 0;
> 
> CPU1:
>     B = 1;
>     RELEASE, unlock LockX;
> 
> CPU2:
>     lock LockX, ACQUIRE
>     if (LOAD A == 1) return; /* using cmp_xchg_relaxed */
> 
> CPU2:
>     A = 0;
>     ACQUIRE, lock LockY
>     smp_mb__after_spinlock();
>     READ B
> 
> Question: is A = 1, B = 0 possible?

Your example is incomplete (there is no A=1 assignment for example), but
I'm thinking I can guess where that should go given the earlier text.

I don't think this is broken.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ