[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150911170526.GU4029@linux.vnet.ibm.com>
Date: Fri, 11 Sep 2015 10:05:26 -0700
From: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To: Oleg Nesterov <oleg@...hat.com>
Cc: Davidlohr Bueso <dave@...olabs.net>,
Peter Zijlstra <peterz@...radead.org>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/1] rcu_sync: Cleanup the CONFIG_PROVE_RCU checks
On Fri, Sep 11, 2015 at 05:59:01PM +0200, Oleg Nesterov wrote:
> On 09/10, Paul E. McKenney wrote:
> >
> > On Thu, Sep 10, 2015 at 03:59:42PM +0200, Oleg Nesterov wrote:
> > > On 09/09, Paul E. McKenney wrote:
> > > >
> > > > This is obsolete, but its replacement is the same patch.
> > >
> > > fbe3b97183f84155d81e506b1aa7d2ce986f7a36 in linux-rcu.git#experimental
> > > I guess?
> > >
> > > > Oleg, Davidlohr, am I missing something on how percpu_rwsem or
> > > > locktorture work?
> > >
> > > No, I think the patch is fine. Thanks for doing this! I was going to
> > > send something like this change too. And in fact I am still thinking
> > > about another test which plays with rcu_sync only, but probably we
> > > need some cleanups first (and we need them anyway). I'll try to do
> > > this a bit later.
> >
> > I would welcome an rcu_sync-specific torture patch!
>
> I want it much more than you ;) I have already warned you, I'll send
> more rcu_sync patches. The current code is actually a very early draft
> which was written during the discussion with Peter a long ago. I sent
> it unchanged because a) it was already reviewed and b) I tested it a
> bit in the past.
>
> We can greatly simplify this code and at the same time make it more
> useful. Actually I already have the patches. The 1st one removes
> rcu_sync->cb_state and gp_ops->sync(). This makes the state machine
> almost self-obvious and allows other improvements. See the resulting
> (pseudo) code at the end.
>
> But again, I'll try very much to write the test before I send the patch.
That sounds very good! ;-)
> Until then, let me send this trivial cleanup. The CONFIG_PROVE_RCU
> code looks trivial but imo really annoying. And it is not complete,
> so lets document this at least. Plus rcu_lockdep_assert() looks more
> consistent.
>
>
> > > > +void torture_percpu_rwsem_init(void)
> > > > +{
> > > > + BUG_ON(percpu_init_rwsem(&pcpu_rwsem));
> > > > +}
> > > > +
> > >
> > > Aha, we don't really need this... I mean we can use the static initialiser
> > > which can also be used by uprobes and cgroups. I'll try to send the patch
> > > tomorrow.
> >
> > Very good, please do!
>
> Hmm. I am lier. I won't send this patch at least today.
>
> The change I had in mind is very simple,
>
> #define DECLARE_PERCPU_RWSEM(sem) \
> static DEFINE_PER_CPU(unsigned int, sem##_counters); \
> struct percpu_rw_semaphore sem = { \
> .fast_read_ctr = &sem##_counters, \
> ... \
> }
>
> and yes, uprobes and cgroups can use it.
>
> But somehow I missed that we can't use it to define a _static_ sem,
>
> static DECLARE_PERCPU_RWSEM(sem);
>
> obviously won't work. And damn, I am shy to admit that I spent several
> hours trying to invent something but failed. Perhaps we can add 2 helpers,
> DECLARE_PERCPU_RWSEM_GLOBAL() and DECLARE_PERCPU_RWSEM_STATIC().
That is indeed what we do for SRCU for the same reason, DEFINE_SRCU()
and DEFINE_STATIC_SRCU(), but with a common __DEFINE_SRCU() doing the
actual work.
Thanx, Paul
> Oleg.
>
> -------------------------------------------------------------------------------
> static const struct {
> void (*call)(struct rcu_head *, void (*)(struct rcu_head *));
> void (*wait)(void); // TODO: remove this
> #ifdef CONFIG_PROVE_RCU
> int (*held)(void);
> #endif
> } gp_ops[] = {
> ...
> };
>
> // COMMENT to explain these states
> enum { GP_IDLE = 0, GP_ENTER, GP_PASSED, GP_EXIT, GP_REPLAY };
>
> #define rss_lock gp_wait.lock
>
> // !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!1!!!!!!!!
> // XXX code must be removed when we split rcu_sync_enter() into start + wait
> // !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>
> static void rcu_sync_func(struct rcu_head *rcu)
> {
> struct rcu_sync *rsp = container_of(rcu, struct rcu_sync, cb_head);
> unsigned long flags;
>
> BUG_ON(rsp->gp_state == GP_IDLE);
> BUG_ON(rsp->gp_state == GP_PASSED);
>
> spin_lock_irqsave(&rsp->rss_lock, flags);
> if (rsp->gp_count) {
> /*
> * COMMENT.
> */
> rsp->gp_state = GP_PASSED;
> wake_up_locked(&rsp->gp_wait);
> } else if (rsp->gp_state == GP_REPLAY) {
> /*
> * A new rcu_sync_exit() has happened; requeue the callback
> * to catch a later GP.
> */
> rsp->gp_state = GP_EXIT;
> gp_ops[rsp->gp_type].call(&rsp->cb_head, rcu_sync_func);
> } else {
> /*
> * We're at least a GP after rcu_sync_exit(); eveybody will now
> * have observed the write side critical section. Let 'em rip!.
> */
> BUG_ON(rsp->gp_state == GP_ENTER); // XXX
> rsp->gp_state = GP_IDLE;
> }
> spin_unlock_irqrestore(&rsp->rss_lock, flags);
> }
>
> static void rcu_sync_call(struct rcu_sync *rsp)
> {
> // TODO:
> // This is called by might_sleep() code outside of ->rss_lock,
> // we can avoid ->call() in some cases (say rcu_blocking_is_gp())
> gp_ops[rsp->gp_type].call(&rsp->cb_head, rcu_sync_func);
> }
>
> void rcu_sync_enter(struct rcu_sync *rsp)
> {
> int gp_count, gp_state;
>
> spin_lock_irq(&rsp->rss_lock);
> gp_count = rsp->gp_count++;
> gp_state = rsp->gp_state;
> if (gp_state == GP_IDLE)
> rsp->gp_state = GP_ENTER;
> spin_unlock_irq(&rsp->rss_lock);
>
> BUG_ON(gp_count != 0 && gp_state == GP_IDLE);
> BUG_ON(gp_count == 0 && gp_state == GP_PASSED);
> BUG_ON(gp_count == 0 && gp_state == GP_ENTER); // XXX
>
> if (gp_state == GP_IDLE)
> rcu_sync_call(rsp);
>
> wait_event(rsp->gp_wait, rsp->gp_state != GP_ENTER);
> BUG_ON(rsp->gp_state < GP_PASSED);
> }
>
> void rcu_sync_exit(struct rcu_sync *rsp)
> {
> bool need_call;
>
> BUG_ON(rsp->gp_state == GP_IDLE);
> BUG_ON(rsp->gp_state == GP_ENTER); // XXX
>
> spin_lock_irq(&rsp->rss_lock);
> if (!--rsp->gp_count) {
> if (rsp->gp_state == GP_PASSED) {
> need_call = true;
> rsp->gp_state = GP_EXIT;
> } else if (rsp->gp_state == GP_EXIT) {
> rsp->gp_state = GP_REPLAY;
> }
> }
> spin_unlock_irq(&rsp->rss_lock);
>
> // Comment to explain why we do not care if another enter()
> // and perhaps even exit() comes after spin_unlock().
> if (need_call)
> rcu_sync_call(rsp);
> }
>
> void rcu_sync_dtor(struct rcu_sync *rsp)
> {
> int gp_state;
>
> BUG_ON(rsp->gp_count);
> BUG_ON(rsp->gp_state == GP_ENTER); // XXX
> BUG_ON(rsp->gp_state == GP_PASSED);
>
> spin_lock_irq(&rsp->rss_lock);
> if (rsp->gp_state == GP_REPLAY)
> rsp->gp_state = GP_EXIT;
> gp_state = rsp->gp_state;
> spin_unlock_irq(&rsp->rss_lock);
>
> // TODO: add another wake_up_locked() into rcu_sync_func(),
> // use wait_event + spin_lock_wait, remove gp_ops->wait().
>
> if (gp_state != GP_IDLE) {
> gp_ops[rsp->gp_type].wait();
> BUG_ON(rsp->gp_state != GP_IDLE);
> }
> }
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists