lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 30 Mar 2009 14:50:31 +0400
From:	Roman Mindalev <r000n@...0n.net>
To:	Eric Dumazet <dada1@...mosbay.com>
Cc:	Stephen Hemminger <shemminger@...tta.com>,
	Patrick McHardy <kaber@...sh.net>,
	LKML <linux-kernel@...r.kernel.org>, netfilter@...r.kernel.org
Subject: Re: Regression caused by commit "netfilter: iptables: lock free
 counters"

On Mon, 30 Mar 2009 09:32:51 +0200
Eric Dumazet <dada1@...mosbay.com> wrote:

> Roman Mindalev a écrit :
> > Hi.
> > Since 2.6.29-git1 I got many debug messages in kernel log.
> > One from them:
> > 
> > Mar 29 12:06:11 desktop kernel: [  917.414369] BUG: using
> > smp_processor_id() in preemptible [00000000] code: hddtemp/2820 Mar
> > 29 12:06:11 desktop kernel: [  917.414378] caller is
> > ipt_do_table+0xd0/0x3a8 [ip_tables] Mar 29 12:06:11 desktop kernel:
> > [  917.414382] Pid: 2820, comm: hddtemp Not tainted
> > 2.6.29-05093-gc31f403 #1 Mar 29 12:06:11 desktop kernel:
> > [  917.414386] Call Trace: Mar 29 12:06:11 desktop kernel:
> > [  917.414393] [<ffffffff8042d037>]
> > debug_smp_processor_id+0xe7/0xf0 Mar 29 12:06:11 desktop kernel:
> > [  917.414400]  [<ffffffffa0171f10>] ipt_do_table+0xd0/0x3a8
> > [ip_tables] Mar 29 12:06:11 desktop kernel: [  917.414414]
> > [<ffffffffa0180d0e>] ? nf_conntrack_in+0x23e/0x610 [nf_conntrack]
> > Mar 29 12:06:11 desktop kernel: [  917.414422]
> > [<ffffffff8026d47d>] ? trace_hardirqs_off+0xd/0x10 Mar 29 12:06:11
> > desktop kernel: [  917.414429]  [<ffffffffa008e0b3>]
> > ipt_local_out_hook+0x53/0x60 [iptable_filter] Mar 29 12:06:11
> > desktop kernel: [  917.414435]  [<ffffffff80546a07>]
> > nf_iterate+0x67/0xa0 Mar 29 12:06:11 desktop kernel:
> > [  917.414441]  [<ffffffff80551140>] ? dst_output+0x0/0x10 Mar 29
> > 12:06:11 desktop kernel: [  917.414446] [<ffffffff80546af5>]
> > nf_hook_slow+0xb5/0x100 Mar 29 12:06:11 desktop kernel:
> > [  917.414452]  [<ffffffff80551140>] ? dst_output+0x0/0x10 Mar 29
> > 12:06:11 desktop kernel: [  917.414458]  [<ffffffff80552f7f>]
> > __ip_local_out+0x9f/0xb0 Mar 29 12:06:11 desktop kernel:
> > [  917.414464]  [<ffffffff8058c4db>] ? bictcp_cong_avoid+0x4b/0x2d0
> > Mar 29 12:06:11 desktop kernel: [  917.414469]
> > [<ffffffff80552fa1>] ip_local_out+0x11/0x30 Mar 29 12:06:11 desktop
> > kernel: [  917.414474] [<ffffffff8055389c>]
> > ip_queue_xmit+0x21c/0x410 Mar 29 12:06:11 desktop kernel:
> > [  917.414480]  [<ffffffff80272825>] ?
> > print_lock_contention_bug+0x25/0x120 Mar 29 12:06:11 desktop
> > kernel: [  917.414486]  [<ffffffff8056810e>] ?
> > tcp_established_options+0x3e/0xe0 Mar 29 12:06:11 desktop kernel:
> > [  917.414492]  [<ffffffff80272825>] ?
> > print_lock_contention_bug+0x25/0x120 Mar 29 12:06:11 desktop
> > kernel: [  917.414498]  [<ffffffff80568ef5>]
> > tcp_transmit_skb+0x535/0x820 Mar 29 12:06:11 desktop kernel:
> > [  917.414504]  [<ffffffff8056b5fb>] tcp_write_xmit+0x20b/0xaa0 Mar
> > 29 12:06:11 desktop kernel: [  917.414510]  [<ffffffff8055d451>] ?
> > sk_stream_alloc_skb+0x41/0x110 Mar 29 12:06:11 desktop kernel:
> > [  917.414515]  [<ffffffff8052799e>] ? __alloc_skb+0x6e/0x150 Mar
> > 29 12:06:11 desktop kernel: [  917.414521] [<ffffffff8056befb>]
> > __tcp_push_pending_frames+0x2b/0x90 Mar 29 12:06:11 desktop kernel:
> > [  917.414527]  [<ffffffff8055ddc3>] tcp_sendmsg+0x8a3/0xbb0 Mar 29
> > 12:06:11 desktop kernel: [  917.414533] [<ffffffff8051d2bb>]
> > sock_aio_write+0x12b/0x140 Mar 29 12:06:11 desktop kernel:
> > [  917.414539]  [<ffffffff802df6f1>] do_sync_write+0xf1/0x140 Mar
> > 29 12:06:11 desktop kernel: [  917.414545]  [<ffffffff8025e470>] ?
> > autoremove_wake_function+0x0/0x40 Mar 29 12:06:11 desktop kernel:
> > [  917.414551]  [<ffffffff802e0259>] vfs_write+0x179/0x190 Mar 29
> > 12:06:11 desktop kernel: [  917.414557]  [<ffffffff802e0360>]
> > sys_write+0x50/0x90 Mar 29 12:06:11 desktop kernel: [  917.414563]
> > [<ffffffff8020b51b>] system_call_fastpath+0x16/0x1b
> > 
> > Result of the bisection:
> > 
> > 784544739a25c30637397ace5489eeb6e15d7d49 is first bad commit
> > commit 784544739a25c30637397ace5489eeb6e15d7d49             
> > Author: Stephen Hemminger <shemminger@...tta.com>           
> > Date:   Fri Feb 20 10:35:32 2009 +0100                      
> > 
> >     netfilter: iptables: lock free counters
> >                                            
> >     The reader/writer lock in ip_tables is acquired in the critical
> >     path of processing packets and is one of the reasons just
> > loading iptables can cause a 20% performance loss. The rwlock
> > serves two functions:
> > 
> >     1) it prevents changes to table state (xt_replace) while table
> > is in use. This is now handled by doing rcu on the xt_table. When
> >     table is replaced, the new table(s) are put in and the old one
> >     table(s) are freed after RCU period.
> > 
> >     2) it provides synchronization when accesing the counter values.
> >        This is now handled by swapping in new table_info entries for
> >     each cpu then summing the old values, and putting the result
> > back onto one cpu.  On a busy system it may cause sampling to occur
> > at different times on each cpu, but no packet/byte counts are lost
> > in the process.
> > 
> >     Signed-off-by: Stephen Hemminger <shemminger@...tta.com>
> > 
> >     Sucessfully tested on my dual quad core machine too, but
> > iptables only (no ipv6 here) BTW, my new "tbench 8" result is 2450
> > MB/s, (it was 2150 MB/s not so long ago)
> > 
> >     Acked-by: Eric Dumazet <dada1@...mosbay.com>
> >     Signed-off-by: Patrick McHardy <kaber@...sh.net>
> > 
> > :040000 040000 67ef5761d009330e8d538dd1a3aff77c86995172
> > 0d8d6398ced2117bb6995b0c414fd8013685324e M      include :040000
> > 040000 0317c745a0722ade9a6c0cbe9d6cb92abbd40efa
> > 0eb195a365164c05e7c764ab1c2a7da891276837 M      net
> > 
> > After reverting commit no more these messages.
> > 
> > This issue in bugzilla:
> > http://bugzilla.kernel.org/show_bug.cgi?id=12960
> 
> Thanks Roman for this excellent report 
> 
> Could you try following patch ?
> 
> [PATCH] netfilter: use rcu_read_bh() in ipt_do_table()
> 
> Commit 784544739a25c30637397ace5489eeb6e15d7d49
> (netfilter: iptables: lock free counters) forgot to disable BH
> in arpt_do_table(), ipt_do_table() and  ip6t_do_table()
> 
> Use rcu_read_lock_bh() instead of rcu_read_lock() cures the problem.
> 
> Reported-and-bisected-by: Roman Mindalev <r000n@...0n.net>
> Signed-off-by: Eric Dumazet <dada1@...mosbay.com>
> ---
> 
> diff --git a/net/ipv4/netfilter/arp_tables.c
> b/net/ipv4/netfilter/arp_tables.c index 35c5f6a..5ba533d 100644
> --- a/net/ipv4/netfilter/arp_tables.c
> +++ b/net/ipv4/netfilter/arp_tables.c
> @@ -253,7 +253,7 @@ unsigned int arpt_do_table(struct sk_buff *skb,
>  	indev = in ? in->name : nulldevname;
>  	outdev = out ? out->name : nulldevname;
>  
> -	rcu_read_lock();
> +	rcu_read_lock_bh();
>  	private = rcu_dereference(table->private);
>  	table_base =
> rcu_dereference(private->entries[smp_processor_id()]); 
> @@ -329,7 +329,7 @@ unsigned int arpt_do_table(struct sk_buff *skb,
>  		}
>  	} while (!hotdrop);
>  
> -	rcu_read_unlock();
> +	rcu_read_unlock_bh();
>  
>  	if (hotdrop)
>  		return NF_DROP;
> diff --git a/net/ipv4/netfilter/ip_tables.c
> b/net/ipv4/netfilter/ip_tables.c index 82ee7c9..810c0b6 100644
> --- a/net/ipv4/netfilter/ip_tables.c
> +++ b/net/ipv4/netfilter/ip_tables.c
> @@ -339,7 +339,7 @@ ipt_do_table(struct sk_buff *skb,
>  
>  	IP_NF_ASSERT(table->valid_hooks & (1 << hook));
>  
> -	rcu_read_lock();
> +	rcu_read_lock_bh();
>  	private = rcu_dereference(table->private);
>  	table_base =
> rcu_dereference(private->entries[smp_processor_id()]); 
> @@ -437,7 +437,7 @@ ipt_do_table(struct sk_buff *skb,
>  		}
>  	} while (!hotdrop);
>  
> -	rcu_read_unlock();
> +	rcu_read_unlock_bh();
>  
>  #ifdef DEBUG_ALLOW_ALL
>  	return NF_ACCEPT;
> diff --git a/net/ipv6/netfilter/ip6_tables.c
> b/net/ipv6/netfilter/ip6_tables.c index e89cfa3..dfed176 100644
> --- a/net/ipv6/netfilter/ip6_tables.c
> +++ b/net/ipv6/netfilter/ip6_tables.c
> @@ -365,7 +365,7 @@ ip6t_do_table(struct sk_buff *skb,
>  
>  	IP_NF_ASSERT(table->valid_hooks & (1 << hook));
>  
> -	rcu_read_lock();
> +	rcu_read_lock_bh();
>  	private = rcu_dereference(table->private);
>  	table_base =
> rcu_dereference(private->entries[smp_processor_id()]); 
> @@ -466,7 +466,7 @@ ip6t_do_table(struct sk_buff *skb,
>  #ifdef CONFIG_NETFILTER_DEBUG
>  	((struct ip6t_entry *)table_base)->comefrom =
> NETFILTER_LINK_POISON; #endif
> -	rcu_read_unlock();
> +	rcu_read_unlock_bh();
>  
>  #ifdef DEBUG_ALLOW_ALL
>  	return NF_ACCEPT;
> 

2.6.29-git4 and 2.6.29-git6 both works fine for me with this patch. 
Thanks, Eric!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ