[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090414101936.1d70879c@nehalam>
Date: Tue, 14 Apr 2009 10:19:36 -0700
From: Stephen Hemminger <shemminger@...tta.com>
To: Eric Dumazet <dada1@...mosbay.com>
Cc: Patrick McHardy <kaber@...sh.net>, paulmck@...ux.vnet.ibm.com,
David Miller <davem@...emloft.net>, paulus@...ba.org,
mingo@...e.hu, torvalds@...ux-foundation.org, laijs@...fujitsu.com,
jeff.chua.linux@...il.com, jengelh@...ozas.de, r000n@...0n.net,
linux-kernel@...r.kernel.org, netfilter-devel@...r.kernel.org,
netdev@...r.kernel.org, benh@...nel.crashing.org
Subject: Re: [PATCH] netfilter: use per-cpu spinlock rather than RCU
On Tue, 14 Apr 2009 17:49:57 +0200
Eric Dumazet <dada1@...mosbay.com> wrote:
> Stephen Hemminger a écrit :
> > On Tue, 14 Apr 2009 16:23:33 +0200
> > Eric Dumazet <dada1@...mosbay.com> wrote:
> >
> >> Patrick McHardy a écrit :
> >>> Stephen Hemminger wrote:
> >>>> This is an alternative version of ip/ip6/arp tables locking using
> >>>> per-cpu locks. This avoids the overhead of synchronize_net() during
> >>>> update but still removes the expensive rwlock in earlier versions.
> >>>>
> >>>> The idea for this came from an earlier version done by Eric Duzamet.
> >>>> Locking is done per-cpu, the fast path locks on the current cpu
> >>>> and updates counters. The slow case involves acquiring the locks on
> >>>> all cpu's.
> >>>>
> >>>> The mutex that was added for 2.6.30 in xt_table is unnecessary since
> >>>> there already is a mutex for xt[af].mutex that is held.
> >>>>
> >>>> Tested basic functionality (add/remove/list), but don't have test cases
> >>>> for stress, ip6tables or arptables.
> >>> Thanks Stephen, I'll do some testing with ip6tables.
> >> Here is the patch I cooked on top of Stephen one to get proper locking.
> >
> > I see no demonstrated problem with locking in my version.
>
> Yes, I did not crash any machine around there, should we wait for a bug report ? :)
>
> > The reader/writer race is already handled. On replace the race of
> >
> > CPU 0 CPU 1
> > lock (iptables(1))
> > refer to oldinfo
> > swap in new info
> > foreach CPU
> > lock iptables(i)
> > (spin) unlock(iptables(1))
> > read oldinfo
> > unlock
> > ...
> >
> > The point is my locking works, you just seem to feel more comfortable with
> > a global "stop all CPU's" solution.
>
> Oh right, I missed that xt_replace_table() was *followed* by a get_counters()
> call, but I am pretty sure something is needed in xt_replace_table().
>
> A memory barrier at least (smp_wmb())
>
> As soon as we do "table->private = newinfo;", other cpus might fetch incorrect
> values for newinfo->fields.
>
> In the past, we had a write_lock_bh()/write_unlock_bh() pair that was
> doing this for us.
> Then we had rcu_assign_pointer() that also had this memory barrier implied.
>
> Even if vmalloc() calls we do before calling xt_replace_table() probably
> already force barriers, add one for reference, just in case we change callers
> logic to call kmalloc() instead of vmalloc() or whatever...
>
You are right, doing something with barrier would be safer there.
How about using xchg?
@@ -682,26 +668,19 @@ xt_replace_table(struct xt_table *table,
struct xt_table_info *newinfo,
int *error)
{
- struct xt_table_info *oldinfo, *private;
+ struct xt_table_info *private;
/* Do the substitution. */
- mutex_lock(&table->lock);
private = table->private;
/* Check inside lock: is the old number correct? */
if (num_counters != private->number) {
duprintf("num_counters != table->private->number (%u/%u)\n",
num_counters, private->number);
- mutex_unlock(&table->lock);
*error = -EAGAIN;
return NULL;
}
- oldinfo = private;
- rcu_assign_pointer(table->private, newinfo);
- newinfo->initial_entries = oldinfo->initial_entries;
- mutex_unlock(&table->lock);
-
- synchronize_net();
- return oldinfo;
+ newinfo->initial_entries = private->initial_entries;
+ return xchg(&table->private, newinfo);
}
EXPORT_SYMBOL_GPL(xt_replace_table);
P.s: we all missed the ordering bug in the RCU version??
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists