[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150414143115.GA493@gmail.com>
Date: Tue, 14 Apr 2015 16:31:15 +0200
From: Ingo Molnar <mingo@...nel.org>
To: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Peter Zijlstra <peterz@...radead.org>,
Rusty Russell <rusty@...tcorp.com.au>,
Oleg Nesterov <oleg@...hat.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Andi Kleen <andi@...stfloor.org>,
Steven Rostedt <rostedt@...dmis.org>,
Thomas Gleixner <tglx@...utronix.de>,
Lai Jiangshan <laijs@...fujitsu.com>,
George Spelvin <linux@...izon.com>,
Andrea Arcangeli <aarcange@...hat.com>,
David Woodhouse <David.Woodhouse@...el.com>,
Rik van Riel <riel@...hat.com>,
Michel Lespinasse <walken@...gle.com>
Subject: Re: [PATCH v5 05/10] seqlock: Better document
raw_write_seqcount_latch()
* Paul E. McKenney <paulmck@...ux.vnet.ibm.com> wrote:
> > I wish rcu_read_lock() had a data argument, for similar reasons -
> > even if it just pointed to a pre-existing lock or an rcu head it
> > never touches ;-)
>
> Heh! Jack Slingwine and I had that argument back in 1993. I
> advocated placing the update-side lock into the rcu_read_lock()
> equivalent, and he responded by showing me a use cases were (1)
> there were no update-side locks and (2) there were many update-side
> locks, and it was impossible to select just one on the read side.
> ;-)
So as a response I'd have attempted to hand-wave something about those
scenarios being either not that common, or not that interesting?!! :-)
> However, DYNIX/ptx did not have anything like rcu_dereference() or
> list_for_each_entry_rcu(), which perhaps can be used in your example
> below. (Hey, that was 20 years ago, when 50MB was a lot of main
> memory. So we relied on compilers being quite dumb.)
I guess compilers are still dumb in many ways ;-)
> > know that cpusets are integrated with cgroups and I search
> > kernel/cgroup.c for call_rcu(), do I find:
> >
> > call_rcu(&css->rcu_head, css_free_rcu_fn);
> >
> > aha!
> >
> > ... or if I drill down 3 levels into cpuset_for_each_child() ->
> > css_for_each_child() -> css_next_child() do I see the RCU
> > iteration.
>
> And I have felt that reviewing pain as well.
>
> But shouldn't these API members be tagged with "_rcu" to make that
> more clear? Sort of like the difference between list_for_each_entry
> and list_for_each_entry_rcu()?
Yes, agreed absolutely!
Having it as a syntactic element instead of a stylistic one forces
such self-documentation though.
At the cost of being an extra nuisance.
> > It would have been a lot clearer from the onset, if I had a hint
> > syntactically:
> >
> > rcu_read_lock(&css->rcu_head);
> > ...
> > rcu_read_unlock(&css->rcu_head);
>
> I cannot resist asking what you put there if the update side uses
> synchronize_rcu()... A NULL pointer? A pointer to
> synchronize_rcu()? Something else? [...]
So I'd either put a suitable zero-size struct ('struct
rcu_head_sync'?) into the protected data structure: i.e. still
annotate it in an active fashion, but don't change any code.
This would be zero size in the non-debug case, but it would allow
debugging data in the debug case.
Another solution would be to not require such linking in all cases,
only when there's a single update side lock.
> [...] And what do you do in the not-uncommon case where multiple
> RCU chains are being traversed in the same RCU read-side critical
> section? One approach would be to use varargs, I suppose. Though
> with a hash table, list, or tree, you could have a -lot- of
> ->rcu_head structures to reference, and concurrent additions and
> deletions mean that you wouldn't necessarily know which at
> rcu_read_lock() time.
I'd definitely keep it simple - i.e. no varargs.
I'd just try to link with the data structure in general. Or I'd just
forget about handling these cases altogether, at least initially -
first see how it works out for the simpler cases.
>
> > beyond the reviewer bonus I bet this would allow some extra debugging
> > as well (only enabled in debug kernels):
> >
> > - for example to make sure we only access a field if _that field_ is
> > RCU locked (reducing the chance of having the right locking for
> > the wrong reason)
>
> One possibility would be to mark each traversal of an RCU-protected
> pointer. Currently, if a multilinked structure is inserted in one
> shot, only the initial pointer to that structure needs to have
> rcu_dereference(). Otherwise, it is hard to tell exactly how far
> the RCU protection is to extend. (Been having too much fun with
> this sort of thing in the standards committees...)
>
> > - we could possibly also build lockdep dependencies out of such
> > annotated RCU locking patterns.
>
> Tell me more?
So to backpedal a bit, I think that in practice the use
rcu_dereference*() gives us a lot of protection and documentation
already - so I might be over-doing it.
But we could essentially split up the current monolithic
rcu_read_lock() class and turn it into classes mirroring the update
side lock classes.
This would at minimum give us access to CONFIG_LOCK_STAT statistics
about the frequency of use of the various locks. Right now I can only
see this:
/proc/lockdep:ffffffff82d27887 OPS: 1412000 FD: 1 BD: 1 ......: rcu_read_lock
/proc/lock_stat: rcu_read_lock-R: 0 0 0.00 0.00 0.00 0.00 0 1135745 0.00 969.66 1355676.33 1.19
/proc/lock_stat: rcu_read_lock_sched-R: 0 0 0.00 0.00 0.00 0.00 0 16 0.25 20.71 30.40 1.90
/proc/lock_stat: rcu_read_lock_bh-R: 0 0 0.00 0.00 0.00 0.00 0 5602 0.33 126.70 40478.88 7.23
which merges them into essentially a single counter.
If we had a more finegrained structure we could tell more about usage
patterns? Not sure how valuable that is though.
So for example on the SRCU side we could detect such deadlocks:
mutex_lock(&mutex);
synchronize_srcu(&srcu);
vs.
srcu_read_lock(&srcu);
mutex_lock(&mutex);
(I don't think we are detecting these right now.)
On the classical RCU side it's harder to construct deadlocks, because
the read side is a non-sleeping and irq-safe primitive, while
synchronize_rcu() is a sleeping method ;-) So most (all?) deadlock
scenarios are avoided by just those properties.
Another use would be to potentially split up the RCU read side into
multiple grace period domains, flushed independently, a bit like how
SRCU does it? That would be a non-debugging use for it.
> > - RCU aware list walking primitives could auto-check that this
> > particular list is properly RCU locked.
>
> For example, that a lock in the proper update class was held during
> the corresponding update?
Yes, and also on the lookup side: that a 'lock' of the proper type is
read-held during lookup. (with a few special annotations for special
cases where as a side effect of other things we may have proper RCU
read side protection.)
Thanks,
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists