[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1239987097.23397.4800.camel@laptop>
Date: Fri, 17 Apr 2009 18:51:37 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: paulmck@...ux.vnet.ibm.com
Cc: David Miller <davem@...emloft.net>, kaber@...sh.net,
torvalds@...ux-foundation.org, shemminger@...tta.com,
dada1@...mosbay.com, jeff.chua.linux@...il.com, paulus@...ba.org,
mingo@...e.hu, laijs@...fujitsu.com, jengelh@...ozas.de,
r000n@...0n.net, linux-kernel@...r.kernel.org,
netfilter-devel@...r.kernel.org, netdev@...r.kernel.org,
benh@...nel.crashing.org, mathieu.desnoyers@...ymtl.ca
Subject: Re: [PATCH] netfilter: use per-cpu spinlock rather than RCU (v3)
On Fri, 2009-04-17 at 09:33 -0700, Paul E. McKenney wrote:
> > One comment, its again a global thing..
> >
> > I've been playing with the idea for a while now to make all RCU
> > implementations into proper objects so that you can do things like:
> >
> > struct atomic_rcu_domain my_rcu_domain = create_atomic_rcu();
> >
> > atomic_rcu_read_lock(&my_rcu_domain());
> > ...
> >
> > atomic_rcu_read_unlock(&my_rcu_domain());
> >
> > and
> >
> > call_atomic_rcu(&my_rcu_domain, &my_obj->rcu_head, do_something);
> >
> > etc..
> >
> > We would have:
> >
> > atomic_rcu -- 'classic' non preemptible RCU (treercu these days)
> > sleep_rcu -- 'preemptible' RCU
> >
> > Then have 3 default domains:
> >
> > sched_rcu -- always atomic_rcu
>
> This is the call_rcu_sched() variant.
>
> > rcu -- depends on PREEMPT_RCU
>
> This is the call_rcu() variant.
>
> > preempt_rcu -- always sleep_rcu
>
> I guess that this one could allow sleeping on mutexes... Does anyone
> need to do that?
I almost did a few times, but never quite got the code that needed it
working good enough for it to go anywhere.
> > This would allow generic code to:
> > 1) use preemptible RCU for those cases where needed
> > 2) create smaller RCU domains where needed, such as in this case
> > 3) mostly do away with SRCU
>
> #3 would be good! But...
>
> At an API level, there are two differences between SRCU and the other
> RCU implementations:
>
> a. The return value from srcu_read_lock() is passed to
> srcu_read_unlock().
>
> b. There is a control block passed in to each SRCU primitive.
>
> Difference (a) could potentially be taken care of with a few tricks I
> am trying in the process of getting preemptrcu merged into treercu.
Right, incrementing one cpu and decrementing another doesn't change the
sum over all cpus :-)
> Your approach to (b) certainly makes it uniform, there are >500
> occurrences of rcu_read_lock() and rcu_read_unlock() each, but only
> a very few occurrences of srcu_read_lock() and srcu_read_unlock()
> (like exactly one each!). So adding an argument to rcu_read_lock()
> does not sound at all reasonable.
static inline void rcu_read_lock(void)
{
atomic_rcu_read_lock(&global_atomic_rcu_context);
}
static inline void rcu_read_unlock(void)
{
atomic_rcu_read_unlock(&global_atomic_rcu_context);
}
static inline void call_rcu(struct rcu_head *rcuhead, void (*func)(struct rcu_head *))
{
call_atomic_rcu(&global_atomic_rcu_context, rcuhead, func);
}
etc.. Should take care of that, no?
> > Now I realize that the presented RCU implementation has a different
> > grace period method than the existing ones that use the timer tick to
> > drive the state machine, so 2) might not be too relevant here. But maybe
> > we can do something with different grace periods too.
> >
> > Anyway, just an idea because I always get a little offended at the hard
> > coded global variables in all these RCU implementations :-)
>
> I am thinking in terms of adding a synchronize_rcu_bh() with the desired
> properties. That way we avoid yet another RCU flavor. (What can I say?
> I got carried away!) Also, since the rcu-bh flavor is used only by
> networking, we have a fair amount of freedom to tweak it.
Right. I was thinking along the way of providing a watermark (either
nr_queued based, or time based) and once it exceeds try to drive it from
read_unlock. Or similar. unlock driven RCU implementations have the best
grace period time every, except for all the down sides ;-)
> It will take
> longer than introducing a new flavor, but Steve Hemminger has a good
> solution already, and RCU really isn't the thing to do quick hacks on.
Ok, back on topic :-)
I wouldn't exactly call it a good solution, it does a
for_each_possible_cpu() spin_lock();
1) that should probably be for_each_online_cpu()
2) that doesn't scale at all, I'm sure dave's 256-way hurts like mad
when inserting tons of rules and we do that for every iptable
modification.
3) there is no way lockdep can track all that :(
Do we _really_ _really_ __HAVE__ to serialize this? So far I've heard
Patrick say: there might be, a use case. That doesn't sound like: we
should make it dead slow for everybody else.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists