[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120411122727.GA9583@linux.vnet.ibm.com>
Date: Wed, 11 Apr 2012 05:27:27 -0700
From: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To: Lai Jiangshan <laijs@...fujitsu.com>
Cc: linux-kernel@...r.kernel.org, mingo@...e.hu, dipankar@...ibm.com,
akpm@...ux-foundation.org, mathieu.desnoyers@...ymtl.ca,
josh@...htriplett.org, niv@...ibm.com, tglx@...utronix.de,
peterz@...radead.org, rostedt@...dmis.org, Valdis.Kletnieks@...edu,
dhowells@...hat.com, eric.dumazet@...il.com, darren@...art.com,
fweisbec@...il.com, patches@...aro.org
Subject: Re: [PATCH 3/4 V2] implement per-domain single-thread state machine
call_srcu()
On Tue, Apr 10, 2012 at 04:18:58PM -0700, Paul E. McKenney wrote:
> On Mon, Mar 19, 2012 at 04:12:13PM +0800, Lai Jiangshan wrote:
> > o state machine is light way and single-threaded, it is preemptible when checking.
> >
> > o state machine is a work_struct. So, there is no thread occupied
> > by SRCU when the srcu is not actived(no callback). And it does
> > not sleep(avoid to occupy a thread when sleep).
> >
> > o state machine is the only thread can flip/check/write(*) the srcu_struct,
> > so we don't need any mutex.
> > (write(*): except ->per_cpu_ref, ->running, ->batch_queue)
> >
> > o synchronize_srcu() will take the processing ownership when no other
> > state-machine is running.(the task of synchronize_srcu() becomes
> > the state-machine-thread). This fast patch can avoid scheduling.
> > When the fast path fails, it falls back to "call_srcu() + wait".
> >
> > o callback is probably called in the same CPU when it is queued.
> >
> > The trip of a callback:
> > 1) ->batch_queue when call_srcu()
> >
> > 2) ->batch_check0 when try to do check_zero
> >
> > 3) ->batch_check1 after finish its first check_zero and the flip
> >
> > 4) ->batch_done after finish its second check_zero
> >
> > The current requirement of the callbacks:
> > The callback will be called inside process context.
> > The callback should be fast without any sleeping path.
>
> The basic algorithm looks safe, but I have some questions on liveness
> and expeditedness below.
[ . . . ]
> > +static void srcu_reschedule(struct srcu_struct *sp)
> > +{
> > + bool pending = true;
> > +
> > + if (rcu_batch_empty(&sp->batch_done) &&
> > + rcu_batch_empty(&sp->batch_check1) &&
> > + rcu_batch_empty(&sp->batch_check0) &&
> > + rcu_batch_empty(&sp->batch_queue)) {
> > + spin_lock_irq(&sp->queue_lock);
> > + if (rcu_batch_empty(&sp->batch_queue)) {
> > + sp->running = false;
> > + pending = false;
> > + }
> > + spin_unlock_irq(&sp->queue_lock);
>
> Hmmm... What happens given the following sequence of events?
>
> o SRCU has just finished executing the last callback, so that
> all callback queues are empty.
>
> o srcu_reschedule() executes the condition of its "if" statement,
> but does not yet acquire the spinlock. (If I read the code
> correctly, preemption could legitimately occur at this point.)
>
> o call_srcu() initializes the callback, acquires the spinlock,
> queues the callback, and invokes queue_delayed_work().
Never mind! The ->running is still set, so call_srcu() is not going
to invoke queue_delayed_work().
Thanx, Paul
> o The delayed work starts executing process_srcu(), which
> calls srcu_collect_new(), which moves the callback to
> ->batch_check0.
>
> o srcu_reschedule continues executing, acquires the spinlock,
> sees that ->batch_queue is empty, and therefore sets
> ->running to false, thus setting the stage for two CPUs
> mucking with the queues concurrently without protection.
>
> I believe that you need to recheck all the queues under the lock,
> not just ->batch_queue (and I did update the patch in this manner).
>
> Or am I missing something subtle?
>
> > + }
> > +
> > + if (pending)
> > + queue_delayed_work(system_nrt_wq, &sp->work, SRCU_INTERVAL);
>
> Here, we carefully invoke queue_delayed_work() outside of the lock,
> while in call_srcu() we invoke it under the lock. Why the difference?
>
> > +}
> > +
> > +static void process_srcu(struct work_struct *work)
> > +{
> > + struct srcu_struct *sp;
> > +
> > + sp = container_of(work, struct srcu_struct, work.work);
> > +
> > + srcu_collect_new(sp);
> > + srcu_advance_batches(sp, 1);
> > + srcu_invoke_callbacks(sp);
> > + srcu_reschedule(sp);
> > +}
> > --
> > 1.7.4.4
> >
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists