linux-kernel - Re: [PATCH] netfilter: use per-cpu spinlock rather than RCU

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090413162000.5f8d9a05@nehalam>
Date:	Mon, 13 Apr 2009 16:20:00 -0700
From:	Stephen Hemminger <shemminger@...tta.com>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	paulmck@...ux.vnet.ibm.com, davem@...emloft.net, paulus@...ba.org,
	mingo@...e.hu, torvalds@...ux-foundation.org, laijs@...fujitsu.com,
	jeff.chua.linux@...il.com, dada1@...mosbay.com, jengelh@...ozas.de,
	kaber@...sh.net, r000n@...0n.net, linux-kernel@...r.kernel.org,
	netfilter-devel@...r.kernel.org, netdev@...r.kernel.org,
	benh@...nel.crashing.org
Subject: Re: [PATCH] netfilter: use per-cpu spinlock rather than RCU

On Mon, 13 Apr 2009 15:24:37 -0700
Andrew Morton <akpm@...ux-foundation.org> wrote:

> On Mon, 13 Apr 2009 09:53:09 -0700
> Stephen Hemminger <shemminger@...tta.com> wrote:
> 
> > This is an alternative version of ip/ip6/arp tables locking using
> > per-cpu locks.  This avoids the overhead of synchronize_net() during
> > update but still removes the expensive rwlock in earlier versions.
> > 
> > The idea for this came from an earlier version done by Eric Duzamet.
> > Locking is done per-cpu, the fast path locks on the current cpu
> > and updates counters.  The slow case involves acquiring the locks on
> > all cpu's.
> > 
> > The mutex that was added for 2.6.30 in xt_table is unnecessary since
> > there already is a mutex for xt[af].mutex that is held.
> > 
> > Tested basic functionality (add/remove/list), but don't have test cases
> > for stress, ip6tables or arptables.
> > 
> >  unsigned int
> >  ipt_do_table(struct sk_buff *skb,
> > @@ -339,9 +341,10 @@ ipt_do_table(struct sk_buff *skb,
> >  
> >  	IP_NF_ASSERT(table->valid_hooks & (1 << hook));
> >  
> > -	rcu_read_lock_bh();
> > -	private = rcu_dereference(table->private);
> > -	table_base = rcu_dereference(private->entries[smp_processor_id()]);
> > +	local_bh_disable();
> > +	spin_lock(&__get_cpu_var(ip_tables_lock));
> 
> spin_lock_bh()?

No. get_cpu_var implies smp_processor_id which is not safe
without preempt_disable (ie bh disable).

> 
> > +	private = table->private;
> > +	table_base = private->entries[smp_processor_id()];
> >  
> >  	e = get_entry(table_base, private->hook_entry[hook]);
> >  
> > @@ -436,8 +439,8 @@ ipt_do_table(struct sk_buff *skb,
> >  			e = (void *)e + e->next_offset;
> >  		}
> >  	} while (!hotdrop);
> > -
> > -	rcu_read_unlock_bh();
> > +	spin_unlock(&__get_cpu_var(ip_tables_lock));
> > +	local_bh_enable();
> >  
> >  #ifdef DEBUG_ALLOW_ALL
> >  	return NF_ACCEPT;
> > @@ -891,86 +894,34 @@ get_counters(const struct xt_table_info 
> >  	     struct xt_counters counters[])
> >  {
> >  	unsigned int cpu;
> > -	unsigned int i;
> > -	unsigned int curcpu;
> > +	unsigned int i = 0;
> > +	unsigned int curcpu = raw_smp_processor_id();
> >  
> >  	/* Instead of clearing (by a previous call to memset())
> >  	 * the counters and using adds, we set the counters
> >  	 * with data used by 'current' CPU
> >  	 * We dont care about preemption here.
> >  	 */
> > -	curcpu = raw_smp_processor_id();
> > -
> > -	i = 0;
> > +	spin_lock_bh(&per_cpu(ip_tables_lock, curcpu));
> >  	IPT_ENTRY_ITERATE(t->entries[curcpu],
> >  			  t->size,
> >  			  set_entry_to_counter,
> >  			  counters,
> >  			  &i);
> > +	spin_unlock_bh(&per_cpu(ip_tables_lock, curcpu));
> >  
> >  	for_each_possible_cpu(cpu) {
> >  		if (cpu == curcpu)
> >  			continue;
> > +
> >  		i = 0;
> > +		spin_lock_bh(&per_cpu(ip_tables_lock, cpu));
> >  		IPT_ENTRY_ITERATE(t->entries[cpu],
> >  				  t->size,
> >  				  add_entry_to_counter,
> >  				  counters,
> >  				  &i);
> > -	}
> 
> This would be racy against cpu hotplug if this code was hotplug-aware.
> 
> And it should be hotplug aware, really.  num_possible_cpus() can exceed
> num_online_cpus().  The extent by which possible>online is
> controversial, but one can conceive of situations where it is "lots".

It is doing right thing already with hotplug.
This code still needs to count packets processed by previously online
cpu, that is no longer there.

> Is lib/percpu_counter.c no good for this application?  Unfixably no
> good?  That code automagically handles cpu hotplug.

percpu_counter can't deal with the layout/load here.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/