lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190620211019.GA3436@hirez.programming.kicks-ass.net>
Date:   Thu, 20 Jun 2019 23:10:19 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     "Paul E. McKenney" <paulmck@...ux.ibm.com>
Cc:     linux-kernel@...r.kernel.org, mingo@...hat.com,
        frederic@...nel.org, tglx@...utronix.de
Subject: Re: [PATCH] time/tick-broadcast: Fix tick_broadcast_offline()
 lockdep complaint

On Thu, Jun 20, 2019 at 09:01:18AM -0700, Paul E. McKenney wrote:

> > > +#define TICK_SCHED_REMOTE_OFFLINE	0
> > > +#define TICK_SCHED_REMOTE_RUNNING	1
> > > +#define TICK_SCHED_REMOTE_OFFLINING	2
> > 
> > That seems a daft set of values; consider { RUNNING, OFFLINING, OFFLINE }
> > and see below.
> 
> As in make it an enum?  I could do that.

Enum or define, I don't much care, but the 'natural' ordering of the
states is either: running -> offlining -> offline, or the other way
around, the given order in the patch just didn't make sense.

The one with running=0 just seems to work out nicer.

> > > +
> > > +// State diagram for ->state:
> > > +//
> > > +//
> > > +//      +----->OFFLINE--------------------------+
> > > +//      |                                       |
> > > +//      |                                       |
> > > +//      |                                       | sched_tick_start()
> > > +//      | sched_tick_remote()                   |
> > > +//      |                                       |
> > > +//      |                                       V
> > > +//      |                        +---------->RUNNING
> > > +//      |                        |              |
> > > +//      |                        |              |
> > > +//      |                        |              |
> > > +//      |     sched_tick_start() |              | sched_tick_stop()
> > > +//      |                        |              |
> > > +//      |                        |              |
> > > +//      |                        |              |
> > > +//      +--------------------OFFLINING<---------+
> > > +//
> > > +//
> > > +// Other transitions get WARN_ON_ONCE(), except that sched_tick_remote()
> > > +// and sched_tick_start() are happy to leave the state in RUNNING.

> > Also, I find it harder to read that needed, maybe a little something
> > like so:
> > 
> > /*
> >  *              OFFLINE
> >  *               |   ^
> >  *               |   | tick_remote()
> >  *               |   |
> >  *               +--OFFLINING
> >  *               |   ^
> >  *  tick_start() |   | tick_stop()
> >  *               v   |
> >  *              RUNNING
> >  */
> 
> As in remove the leading "sched_" from the function names?  (The names
> were already there, so I left them be.)

That was just me being lazy, the main part being getting the states in a
linear order, instead of spread around a 2d grid.

> > While not wrong, it seems overly complicated; can't we do something
> > like:
> > 
> > tick:
> 
> As in sched_tick_remote(), right?
> 
> > 	state = atomic_read(->state);
> > 	if (state) {
> 
> You sure you don't want "if (state != RUNNING)"?  But I guess you need
> to understand that RUNNING==0 to understand the atomic_inc_not_zero().

Right..

> 
> > 		WARN_ON_ONCE(state != OFFLINING);
> > 		if (atomic_inc_not_zero(->state))
> 
> This assumes that there cannot be concurrent calls to sched_tick_remote(),
> otherwise, you can end up with ->state==3.  Which is a situation that
> my version does a WARN_ON_ONCE() for, so I guess the only difference is
> that mine would be guaranteed to complain and yours would complain with
> high probability.  So fair enough!

I was assuming there was only a single work per CPU and there'd not be
concurrency on this path.

> > 			return;
> > 	}
> > 	queue_delayed_work();
> > 
> > 
> > stop:
> > 	/*
> > 	 * This is hotplug; even without stop-machine, there cannot be
> > 	 * concurrency on offlining specific CPUs.
> > 	 */
> > 	state = atomic_read(->state);
> 
> There cannot be a sched_tick_stop() or sched_tick_stop(), but there really
> can be a sched_tick_remote() right here in the absence of stop-machine,
> can't there?  Or am I missing something other than stop-machine that
> prevents this?

There can be a remote tick, indeed.

> Now, you could argue that concurrency is safe: Either sched_tick_remote()
> sees RUNNING and doesn't touch the value, or it sees offlining and
> sched_tick_stop() makes no further changes,

That was indeed the thinking.

> but I am not sure that this qualifies as simpler...

There is that I suppose. I think my initial version was a little more
complicated, but after a few passes this happened :-)

> > 	WARN_ON_ONCE(state != RUNNING);
> > 	atomic_set(->state, OFFLINING);
> 
> Another option would be to use atomic_xchg() as below instead of the
> atomic_read()/atomic_set() pair.  Would that work for you?

Yes, that works I suppose. Is more expensive, but I don't think we
particularly care about that here.

> > start:
> > 	state = atomic_xchg(->state, RUNNING);
> > 	WARN_ON_ONCE(state == RUNNING);
> > 	if (state == OFFLINE) {
> > 		// ...
> > 		queue_delayed_work();
> > 	}
> 
> This one looks to be an improvement on mine regardless of the other two.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ