netdev - Re: [PATCH] iptables: lock free counters

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <49A7F262.8040805@cosmosbay.com>
Date:	Fri, 27 Feb 2009 15:02:10 +0100
From:	Eric Dumazet <dada1@...mosbay.com>
To:	Stephen Hemminger <shemminger@...tta.com>
CC:	David Miller <davem@...emloft.net>,
	Patrick McHardy <kaber@...sh.net>,
	Rick Jones <rick.jones2@...com>, netdev@...r.kernel.org,
	netfilter-devel@...r.kernel.org,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Subject: Re: [PATCH] iptables: lock free counters

Eric Dumazet a écrit :
> Stephen Hemminger a écrit :
>> The reader/writer lock in ip_tables is acquired in the critical path of
>> processing packets and is one of the reasons just loading iptables can cause
>> a 20% performance loss. The rwlock serves two functions:
>>
>> 1) it prevents changes to table state (xt_replace) while table is in use.
>>    This is now handled by doing rcu on the xt_table. When table is
>>    replaced, the new table(s) are put in and the old one table(s) are freed
>>    after RCU period.
>>
>> 2) it provides synchronization when accesing the counter values.
>>    This is now handled by swapping in new table_info entries for each cpu
>>    then summing the old values, and putting the result back onto one
>>    cpu.  On a busy system it may cause sampling to occur at different
>>    times on each cpu, but no packet/byte counts are lost in the process.
>>
>> Signed-off-by: Stephen Hemminger <shemminger@...tta.com>
> 
> 
> Acked-by: Eric Dumazet <dada1@...mosbay.com>
> 
> Sucessfully tested on my dual quad core machine too, but iptables only (no ipv6 here)
> 
> BTW, my new "tbench 8" result is 2450 MB/s, (it was 2150 MB/s not so long ago)
> 
> Thanks Stephen, thats very cool stuff, yet another rwlock out of kernel :)
>

While testing multicast flooding stuff, I found that "iptables -nvL" can 
have a *very* slow response time on my dual quad core machine...

   LatencyTOP version 0.5       (C) 2008 Intel Corporation

Cause                                                Maximum     Percentage
synchronize_rcu synchronize_net do_ipt_get_ctl nf_1878.6 msec          3.1 %
Scheduler: waiting for cpu                        160.3 msec         13.6 %
do_get_write_access journal_get_write_access __ext 11.0 msec          0.0 %
do_get_write_access journal_get_write_access __ext  7.7 msec          0.0 %
poll_schedule_timeout do_select core_sys_select sy  4.9 msec          0.0 %
do_wait sys_wait4 sys_waitpid sysenter_do_call      3.4 msec          0.1 %
call_usermodehelper_exec request_module netlink_cr  1.6 msec          0.0 %
__skb_recv_datagram skb_recv_datagram raw_recvmsg   1.5 msec          0.0 %
do_wait sys_wait4 sysenter_do_call                  0.7 msec          0.0 %


# time iptables -nvL
Chain INPUT (policy ACCEPT 416M packets, 64G bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain OUTPUT (policy ACCEPT 401M packets, 62G bytes)
 pkts bytes target     prot opt in     out     source               destination

real    0m1.810s
user    0m0.000s
sys     0m0.001s


CONFIG_NO_HZ=y
CONFIG_HZ_1000=y
CONFIG_HZ=1000

One cpu is 100% handling softirqs, could it be the problem ?

Cpu0  :  1.0%us, 14.7%sy,  0.0%ni, 83.3%id,  0.0%wa,  0.0%hi,  1.0%si,  0.0%st
Cpu1  :  3.6%us, 23.2%sy,  0.0%ni, 71.6%id,  0.0%wa,  0.0%hi,  1.7%si,  0.0%st
Cpu2  :  0.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,100.0%si,  0.0%st
Cpu3  :  2.7%us, 23.9%sy,  0.0%ni, 71.1%id,  0.7%wa,  0.0%hi,  1.7%si,  0.0%st
Cpu4  :  1.3%us, 14.3%sy,  0.0%ni, 83.3%id,  0.0%wa,  0.0%hi,  1.0%si,  0.0%st
Cpu5  :  1.0%us, 14.2%sy,  0.0%ni, 83.4%id,  0.0%wa,  0.0%hi,  1.3%si,  0.0%st
Cpu6  :  0.3%us,  7.0%sy,  0.0%ni, 92.4%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu7  :  0.7%us,  8.0%sy,  0.0%ni, 90.0%id,  0.7%wa,  0.0%hi,  0.7%si,  0.0%st

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html