linux-kernel - Re: [PATCH] netfilter: use per-cpu spinlock rather than RCU

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090414101936.1d70879c@nehalam>
Date:	Tue, 14 Apr 2009 10:19:36 -0700
From:	Stephen Hemminger <shemminger@...tta.com>
To:	Eric Dumazet <dada1@...mosbay.com>
Cc:	Patrick McHardy <kaber@...sh.net>, paulmck@...ux.vnet.ibm.com,
	David Miller <davem@...emloft.net>, paulus@...ba.org,
	mingo@...e.hu, torvalds@...ux-foundation.org, laijs@...fujitsu.com,
	jeff.chua.linux@...il.com, jengelh@...ozas.de, r000n@...0n.net,
	linux-kernel@...r.kernel.org, netfilter-devel@...r.kernel.org,
	netdev@...r.kernel.org, benh@...nel.crashing.org
Subject: Re: [PATCH] netfilter: use per-cpu spinlock rather than RCU

On Tue, 14 Apr 2009 17:49:57 +0200
Eric Dumazet <dada1@...mosbay.com> wrote:

> Stephen Hemminger a écrit :
> > On Tue, 14 Apr 2009 16:23:33 +0200
> > Eric Dumazet <dada1@...mosbay.com> wrote:
> > 
> >> Patrick McHardy a écrit :
> >>> Stephen Hemminger wrote:
> >>>> This is an alternative version of ip/ip6/arp tables locking using
> >>>> per-cpu locks.  This avoids the overhead of synchronize_net() during
> >>>> update but still removes the expensive rwlock in earlier versions.
> >>>>
> >>>> The idea for this came from an earlier version done by Eric Duzamet.
> >>>> Locking is done per-cpu, the fast path locks on the current cpu
> >>>> and updates counters.  The slow case involves acquiring the locks on
> >>>> all cpu's.
> >>>>
> >>>> The mutex that was added for 2.6.30 in xt_table is unnecessary since
> >>>> there already is a mutex for xt[af].mutex that is held.
> >>>>
> >>>> Tested basic functionality (add/remove/list), but don't have test cases
> >>>> for stress, ip6tables or arptables.
> >>> Thanks Stephen, I'll do some testing with ip6tables.
> >> Here is the patch I cooked on top of Stephen one to get proper locking.
> > 
> > I see no demonstrated problem with locking in my version.
> 
> Yes, I did not crash any machine around there, should we wait for a bug report ? :)
> 
> > The reader/writer race is already handled. On replace the race of
> > 
> > CPU 0                          CPU 1
> >                            lock (iptables(1))
> >                            refer to oldinfo
> > swap in new info
> > foreach CPU
> >    lock iptables(i)
> >    (spin)                  unlock(iptables(1))
> >    read oldinfo
> >    unlock
> > ...
> > 
> > The point is my locking works, you just seem to feel more comfortable with
> > a global "stop all CPU's" solution.
> 
> Oh right, I missed that xt_replace_table() was *followed* by a get_counters()
> call, but I am pretty sure something is needed in xt_replace_table().
> 
> A memory barrier at least (smp_wmb())
> 
> As soon as we do "table->private = newinfo;", other cpus might fetch incorrect
> values for newinfo->fields.
> 
> In the past, we had a write_lock_bh()/write_unlock_bh() pair that was
> doing this for us.
> Then we had rcu_assign_pointer() that also had this memory barrier implied.
> 
> Even if vmalloc() calls we do before calling xt_replace_table() probably
> already force barriers, add one for reference, just in case we change callers
> logic to call kmalloc() instead of vmalloc() or whatever...
>

You are right, doing something with barrier would be safer there.
How about using xchg?

@@ -682,26 +668,19 @@ xt_replace_table(struct xt_table *table,
 	      struct xt_table_info *newinfo,
 	      int *error)
 {
-	struct xt_table_info *oldinfo, *private;
+	struct xt_table_info *private;
 
 	/* Do the substitution. */
-	mutex_lock(&table->lock);
 	private = table->private;
 	/* Check inside lock: is the old number correct? */
 	if (num_counters != private->number) {
 		duprintf("num_counters != table->private->number (%u/%u)\n",
 			 num_counters, private->number);
-		mutex_unlock(&table->lock);
 		*error = -EAGAIN;
 		return NULL;
 	}
-	oldinfo = private;
-	rcu_assign_pointer(table->private, newinfo);
-	newinfo->initial_entries = oldinfo->initial_entries;
-	mutex_unlock(&table->lock);
-
-	synchronize_net();
-	return oldinfo;
+	newinfo->initial_entries = private->initial_entries;
+	return xchg(&table->private, newinfo);
 }
 EXPORT_SYMBOL_GPL(xt_replace_table);
 
P.s: we all missed the ordering bug in the RCU version??

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/