netdev - Re: iptables very slow after commit 784544739a25c30637397ace5489eeb6e15d7d49

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090411054206.GC6822@linux.vnet.ibm.com>
Date:	Fri, 10 Apr 2009 22:42:06 -0700
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Jan Engelhardt <jengelh@...ozas.de>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	David Miller <davem@...emloft.net>,
	Ingo Molnar <mingo@...e.hu>,
	Lai Jiangshan <laijs@...fujitsu.com>, shemminger@...tta.com,
	jeff.chua.linux@...il.com, dada1@...mosbay.com, kaber@...sh.net,
	r000n@...0n.net,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	netfilter-devel@...r.kernel.org, netdev@...r.kernel.org
Subject: Re: iptables very slow after commit
	784544739a25c30637397ace5489eeb6e15d7d49

On Sat, Apr 11, 2009 at 07:14:50AM +0200, Jan Engelhardt wrote:
> 
> On Saturday 2009-04-11 06:15, Paul E. McKenney wrote:
> >On Fri, Apr 10, 2009 at 06:39:18PM -0700, Linus Torvalds wrote:
> >>An unhappy user reported:
> >>>>> Adding 200 records in iptables took 6.0sec in 2.6.30-rc1 compared to 
> >>>>> 0.2sec in 2.6.29. I've bisected down this commit.
> >>>>> 784544739a25c30637397ace5489eeb6e15d7d49
> >> 
> >> I wonder if we should bring in the RCU people too, for them to tell you 
> >> that the networking people are beign silly, and should not synchronize 
> >> with the very heavy-handed
> >> 
> >> 	synchronize_net()
> >> 
> >> but instead of doing synchronization (which is probably why adding a few 
> >> hundred rules then takes several seconds - each synchronizes and that 
> >> takes a timer tick or so), add the rules to be free'd on some rcu-freeing 
> >> list for later freeing.
> 
> iptables works in whole tables. Userspace submits a table, checkentry is 
> called for all rules in the new table, things are swapped, then destroy 
> is called for all rules in the old table. By that logic (which existed
> since dawn I think), only the swap operation needs to be locked.
> 
> Jeff Chua wrote:
> >So, to make it easy for testing, you can do a loop like this ...
> >        for((i = 1; i < 100; i++))
> >        do
> >                iptables -A block -s 10.0.0.$i -j ACCEPT
> >        done
> 
> The fact that `iptables -A` is called a hundred times means you are 
> doing 100 table replacements -- instead of one. And calling
> synchronize_net at least a 100 times.
> 
> "Wanna use iptables-restore?"
> 
> >1.	Assuming that the synchronize_net() is intended to guarantee
> >	that the new rules will be in effect before returning to
> >	user space:
> 
> As I read the new code, it seems that synchronize_net is only
> used on copying the rules from kernel into userspace;
> not when updating them from userspace:
> 
> IPT_SO_GET_ENTRIES -> get_entries -> copy_entries_to_user -> 
> alloc_counters -> synchronize_net.

OK.

> >3.	For the alloc_counters() case, the comments indicate that we
> >	really truly do want an atomic sampling of the counters.
> >	The counters are 64-bit entities, which is a bit inconvenient.
> >	Though people using this functionality are no doubt quite happy
> >	to never have to worry about overflow, I hasten to add!
> >
> >	I will nevertheless suggest the following egregious hack to
> >	get a consistent sample of one counter for some other CPU:
> >       [...]
> 
> Would a seqlock suffice, as it does for the 64-bit jiffies?

The 64-bit jiffies counter is not updated often, so write-acquiring a
seqlock on each update is OK.  From what I understand, these counters
are updated quite often (one each packet transmission or reception?),
so write-acquiring on each update would be quite painful.

Or did you have something else in mind here?

							Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html