lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 21 Apr 2009 06:59:29 +0200
From:	Eric Dumazet <dada1@...mosbay.com>
To:	Stephen Hemminger <shemminger@...tta.com>
CC:	Paul Mackerras <paulus@...ba.org>, paulmck@...ux.vnet.ibm.com,
	Evgeniy Polyakov <zbr@...emap.net>,
	David Miller <davem@...emloft.net>, kaber@...sh.net,
	torvalds@...ux-foundation.org, jeff.chua.linux@...il.com,
	mingo@...e.hu, laijs@...fujitsu.com, jengelh@...ozas.de,
	r000n@...0n.net, linux-kernel@...r.kernel.org,
	netfilter-devel@...r.kernel.org, netdev@...r.kernel.org,
	benh@...nel.crashing.org, mathieu.desnoyers@...ymtl.ca
Subject: Re: [PATCH] netfilter: use per-cpu recursive lock (v11)

Stephen Hemminger a écrit :
> This version of x_tables (ip/ip6/arp) locking uses a per-cpu
> recursive lock that can be nested. It is sort of like existing kernel_lock,
> rwlock_t and even old 2.4 brlock.
> 
> "Reader" is ip/arp/ip6 tables rule processing which runs per-cpu.
> It needs to ensure that the rules are not being changed while packet
> is being processed.
> 
> "Writer" is used in two cases: first is replacing rules in which case
> all packets in flight have to be processed before rules are swapped,
> then counters are read from the old (stale) info. Second case is where
> counters need to be read on the fly, in this case all CPU's are blocked
> from further rule processing until values are aggregated.
> 
> The idea for this came from an earlier version done by Eric Dumazet.
> Locking is done per-cpu, the fast path locks on the current cpu
> and updates counters.  This reduces the contention of a
> single reader lock (in 2.6.29) without the delay of synchronize_net()
> (in 2.6.30-rc2). 
> 
> The mutex that was added for 2.6.30 in xt_table is unnecessary since
> there already is a mutex for xt[af].mutex that is held.
> 
> Signed-off-by: Stephen Hemminger <shemminger@...tta.com

I reviewed this patch believe its in quite good shape, thanks Stephen.

Then I tested it on a x86_32 8 cpus machine and got no obvious problem.

Signed-off-by: Eric Dumazet <dada1@...mosbay.com>

Hopefully, next rcu_bh (or whatever name is used) will permit us
to switch back to pure RCU in 2.6.31
 
oprofile snapshot of a tbench session, with light iptables rules.
(4 rules in INPUT chain, 3 rules on OUTPUT)

xt_info_rdlock_bh() uses 0.6786 % of cpu
xt_info_rdunlock_bh() uses 0.1743 % of cpu


CPU: Core 2, speed 3000.77 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000
samples  cum. samples  %        cum. %     symbol name
1248350  1248350       11.3285  11.3285    copy_from_user
534049   1782399        4.8464  16.1749    copy_to_user
480898   2263297        4.3641  20.5390    __schedule
325581   2588878        2.9546  23.4936    ipt_do_table
312697   2901575        2.8377  26.3312    tcp_ack
309381   3210956        2.8076  29.1388    tcp_sendmsg
248238   3459194        2.2527  31.3915    tcp_v4_rcv
230405   3689599        2.0909  33.4824    tcp_transmit_skb
220638   3910237        2.0022  35.4847    ip_queue_xmit
217099   4127336        1.9701  37.4548    tcp_recvmsg
175885   4303221        1.5961  39.0509    tcp_rcv_established
173112   4476333        1.5710  40.6219    __switch_to
165138   4641471        1.4986  42.1205    sysenter_past_esp
149367   4790838        1.3555  43.4759    dst_release
138619   4929457        1.2579  44.7339    sched_clock_cpu
132724   5062181        1.2044  45.9383    lock_sock_nested
121353   5183534        1.1013  47.0396    nf_iterate
119205   5302739        1.0818  48.1214    netif_receive_skb
118859   5421598        1.0786  49.2000    release_sock
112597   5534195        1.0218  50.2218    __inet_lookup_established
112195   5646390        1.0181  51.2399    sys_socketcall
110018   5756408        0.9984  52.2383    tcp_write_xmit
106466   5862874        0.9662  53.2045    __alloc_skb
93386    5956260        0.8475  54.0519    dev_queue_xmit
89229    6045489        0.8097  54.8617    tcp_event_data_recv
85972    6131461        0.7802  55.6418    local_bh_enable
82882    6214343        0.7521  56.3940    skb_release_data
80898    6295241        0.7341  57.1281    ip_rcv
76380    6371621        0.6931  57.8213    skb_copy_datagram_iovec
74782    6446403        0.6786  58.4999    xt_info_rdlock_bh
73593    6519996        0.6678  59.1677    mod_timer
72884    6592880        0.6614  59.8291    sock_recvmsg
71789    6664669        0.6515  60.4806    __copy_skb_header
70560    6735229        0.6403  61.1209    fget_light
68756    6803985        0.6239  61.7449    get_page_from_freelist
68378    6872363        0.6205  62.3654    put_page
68042    6940405        0.6175  62.9829    ip_finish_output
67618    7008023        0.6136  63.5965    page_address
64894    7072917        0.5889  64.1854    tcp_cleanup_rbuf


> 
> ---
> CHANGES 
>   - optimize for UP
>   - disable bottom half in info_rdlock
>   - prevent preempt count overflow
>   - turn off lockdep in writer to avoid bogus warning
>   - optimize unlock_bh
> 
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ