linux-kernel - Re: [GIT PULL rcu/next] RCU commits for 4.13

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Pine.LNX.4.44L0.1706291132310.1571-100000@iolanthe.rowland.org>
Date:   Thu, 29 Jun 2017 11:59:27 -0400 (EDT)
From:   Alan Stern <stern@...land.harvard.edu>
To:     Will Deacon <will.deacon@....com>
cc:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Paul McKenney <paulmck@...ux.vnet.ibm.com>,
        Andrea Parri <parri.andrea@...il.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        <priyalee.kushwaha@...el.com>,
        Stanisław Drozd <drozdziak1@...il.com>,
        Arnd Bergmann <arnd@...db.de>, <ldr709@...il.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>,
        Josh Triplett <josh@...htriplett.org>,
        Nicolas Pitre <nico@...aro.org>,
        Krister Johansen <kjlx@...pleofstupid.com>,
        Vegard Nossum <vegard.nossum@...cle.com>, <dcb314@...mail.com>,
        Wu Fengguang <fengguang.wu@...el.com>,
        Frederic Weisbecker <fweisbec@...il.com>,
        Rik van Riel <riel@...hat.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ingo Molnar <mingo@...nel.org>,
        Luc Maranget <luc.maranget@...ia.fr>,
        Jade Alglave <j.alglave@....ac.uk>
Subject: Re: [GIT PULL rcu/next] RCU commits for 4.13

On Thu, 29 Jun 2017, Will Deacon wrote:

> [turns out I've not been on cc for this thread, but Jade pointed me to it
>  and I see my name came up at some point!]
> 
> On Wed, Jun 28, 2017 at 05:05:46PM -0700, Linus Torvalds wrote:
> > On Wed, Jun 28, 2017 at 4:54 PM, Paul E. McKenney
> > <paulmck@...ux.vnet.ibm.com> wrote:
> > >
> > > Linus, are you dead-set against defining spin_unlock_wait() to be
> > > spin_lock + spin_unlock?  For example, is the current x86 implementation
> > > of spin_unlock_wait() really a non-negotiable hard requirement?  Or
> > > would you be willing to live with the spin_lock + spin_unlock semantics?
> > 
> > So I think the "same as spin_lock + spin_unlock" semantics are kind of insane.
> > 
> > One of the issues is that the same as "spin_lock + spin_unlock" is
> > basically now architecture-dependent. Is it really the
> > architecture-dependent ordering you want to define this as?
> > 
> > So I just think it's a *bad* definition. If somebody wants something
> > that is exactly equivalent to spin_lock+spin_unlock, then dammit, just
> > do *THAT*. It's completely pointless to me to define
> > spin_unlock_wait() in those terms.
> > 
> > And if it's not equivalent to the *architecture* behavior of
> > spin_lock+spin_unlock, then I think it should be descibed in terms
> > that aren't about the architecture implementation (so you shouldn't
> > describe it as "spin_lock+spin_unlock", you should describe it in
> > terms of memory barrier semantics.
> > 
> > And if we really have to use the spin_lock+spinunlock semantics for
> > this, then what is the advantage of spin_unlock_wait at all, if it
> > doesn't fundamentally avoid some locking overhead of just taking the
> > spinlock in the first place?
> 
> Just on this point -- the arm64 code provides the same ordering semantics
> as you would get from a lock;unlock sequence, but we can optimise that
> when compared to an actual lock;unlock sequence because we don't need to
> wait in turn for our ticket. I suspect something similar could be done
> if/when we move to qspinlocks.
> 
> Whether or not this is actually worth optimising is another question, but
> it is worth noting that unlock_wait can be implemented more cheaply than
> lock;unlock, whilst providing the same ordering guarantees (if that's
> really what we want -- see my reply to Paul).
> 
> Simplicity tends to be my preference, so ripping this out would suit me
> best ;)

It would be best to know:

	(1). How spin_unlock_wait() is currently being used.

	(2). What it was originally intended for.

Paul has done some research into (1).  He can correct me if I get this
wrong...  Only a few (i.e., around one or two) of the usages don't seem
to require the full spin_lock+spin_unlock semantics.  I go along with
Linus; the places which really do want it to behave like
spin_lock+spin_unlock should simply use spin_lock+spin_unlock.  There
hasn't been any indication so far that the possible efficiency
improvement Will mentions is at all important.

According to Paul, most of the other places don't need anything more
than the acquire guarantee (any changes made in earlier critical
sections will be visible to the code following spin_unlock_wait).  In
which case, the semantics of spin_unlock_wait could be redefined in
this simpler form.

Or we could literally replace all the existing definitions with 
spin_lock+spin_unlock.  Would that be so terrible?

As regards (2), I did a little digging.  spin_unlock_wait was
introduced in the 2.1.36 kernel, in mid-April 1997.  I wasn't able to
find a specific patch for it in the LKML archives.  At the time it
was used in only one place in the entire kernel (in kernel/exit.c):

void release(struct task_struct * p)
{
	int i;

	if (!p)
		return;
	if (p == current) {
		printk("task releasing itself\n");
		return;
	}
	for (i=1 ; i<NR_TASKS ; i++)
		if (task[i] == p) {
#ifdef __SMP__
			/* FIXME! Cheesy, but kills the window... -DaveM */
			while(p->processor != NO_PROC_ID)
				barrier();
			spin_unlock_wait(&scheduler_lock);
#endif
			nr_tasks--;
			task[i] = NULL;
			REMOVE_LINKS(p);
			release_thread(p);
			if (STACK_MAGIC != *(unsigned long *)p->kernel_stack_page)
				printk(KERN_ALERT "release: %s kernel stack corruption. Aiee\n", p->comm);
			free_kernel_stack(p->kernel_stack_page);
			current->cmin_flt += p->min_flt + p->cmin_flt;
			current->cmaj_flt += p->maj_flt + p->cmaj_flt;
			current->cnswap += p->nswap + p->cnswap;
			free_task_struct(p);
			return;
		}
	panic("trying to release non-existent task");
}

I'm not entirely clear on the point of this call.  It looks like it 
wanted to wait until p was guaranteed not to be running on any 
processor ever again.  (I don't see why it couldn't have just acquired 
the scheduler_lock -- was release() a particularly hot path?)

Although it doesn't matter now, this would mean that the original
semantics of spin_unlock_wait were different from what we are
discussing.  It apparently was meant to provide the release guarantee:
any future critical sections would see the values that were visible
before the call.  Ironic.

Alan Stern