linux-kernel - Re: Q: smp.c && barriers (Was: [PATCH 1/4] generic-smp: remove single ipi fallback for smp_call_function

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090217101130.GA8660@wotan.suse.de>
Date:	Tue, 17 Feb 2009 11:11:30 +0100
From:	Nick Piggin <npiggin@...e.de>
To:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Cc:	Oleg Nesterov <oleg@...hat.com>,
	Jens Axboe <jens.axboe@...cle.com>,
	Suresh Siddha <suresh.b.siddha@...el.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Ingo Molnar <mingo@...e.hu>,
	Rusty Russell <rusty@...tcorp.com.au>,
	Steven Rostedt <rostedt@...dmis.org>,
	linux-kernel@...r.kernel.org
Subject: Re: Q: smp.c && barriers (Was: [PATCH 1/4] generic-smp: remove single ipi fallback for smp_call_function_many())

On Tue, Feb 17, 2009 at 10:29:34AM +0100, Peter Zijlstra wrote:
> On Tue, 2009-02-17 at 00:19 +0100, Oleg Nesterov wrote:
> > > cpu0:		cpu1:
> > >
> > > spin_lock_irqsave(&dst->lock, flags);
> > > ipi = list_empty(&dst->list);
> > > list_add_tail(&data->list, &dst->list);
> > > spin_unlock_irqrestore(&dst->lock, flags);
> > >
> > > ipi ----->
> > >
> > > 		while (!list_empty(&q->list)) {
> > >                 	unsigned int data_flags;
> > >
> > >                 	spin_lock(&q->lock);
> > >                		list_replace_init(&q->list, &list);
> > > 	                spin_unlock(&q->lock);
> > >
> > >
> > > Strictly speaking the unlock() is semi-permeable, allowing the read of
> > > q->list to enter the critical section, allowing us to observe an empty
> > > list, never getting to q->lock on cpu1.
> > 
> > Hmm. If we take &q->lock, then we alread saw !list_empty() ?
> 
> That's how I read the above code.
> 
> > And the question is, how can we miss list_empty() == F before spin_lock().
> 
> Confusion... my explanation above covers exactly this case. The reads
> determining list_empty() can slip into the q->lock section on the other
> cpu, and observe an empty list.

But in that case, cpu0 should see list_empty and send another IPI,
because our load of list_empty has moved before the unlock of the
lock, so there can't be another item concurrently put on the list.

But hmm, why even bother with all this complexity? Why not just
remove the outer loop completely? Do the lock and the list_replace_init
unconditionally. It would turn tricky lockless code into simple locked
code... we've already taken an interrupt anyway, so chances are pretty
high that we have work here to do, right?

> > > > Even if I missed something (very possible), then I can't understand
> > > > why we need rmb() only on alpha.
> > >
> > > Because only alpha is insane enough to do speculative reads? Dunno
> > > really :-)
> > 
> > Perhaps...
> > 
> > It would be nice to have a comment which explains how can we miss the
> > first addition without read_barrier_depends(). And why only on alpha.
> 
> Paul, care to once again enlighten us? The best I can remember is that
> alpha has split caches, and the rmb is needed for them to become
> coherent -- no other arch is crazy in exactly that way.

Other architectures can do speculative reads, and several (sparc, ia64,
powerpc AFAIK) do have release barrier semantics for their unlocks so
they could equally have loads passing spin_unlock.

Alpha has split cache I guess processing external cache coherency requests
independently. So it can see a pair of ordered stores coming from one
CPU (or CPUs if you want to get tricky I guess) in the wrong order.
This is not because of the loads being executed speculatively out of
order, although the end result is similar. The big difference why it
isn't hidden behind a regular smp_rmb() is because no CPU supported
by Linux does speculative loads over data dependencies, so we only
define smp_rmb to order loads that have no dependencies or only control
dependencies.

But you probably don't have to care about caches if it makes reasoning
easier. For all purposes it can be treated as Alpha speculatively
executing data dependent loads out of order I think eg.

x = *ptr;
y = array[x];

Then the CPU misses cache on the first load and thinks hmm, every
time I have loaded this it has been 10, so I'll do the 2nd load with
that value and retry if it turns out the prediction was wrong when
the 1st load completes. In the case that the 2nd load completes
first, and the prediction of the 1st load was correct, then the loads
were executed out of order... equivalent to out of order speculative
loads with control deps I guess.

But what actually happens on Alpha is not speculation but simply
the stores get seen in the wrong order.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/