lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 21 Aug 2012 10:07:50 +0800
From:	Fengguang Wu <fengguang.wu@...el.com>
To:	Eric Dumazet <eric.dumazet@...il.com>
Cc:	David Miller <davem@...emloft.net>,
	networking <netdev@...r.kernel.org>
Subject: Re: [PATCH] tcp: fix possible socket refcount problem

I'm sure it fixed the problem: up to now, there are 264 boots (each
taking 1.5 hours) without a single failure. Thanks for the quick fix!

On Mon, Aug 20, 2012 at 12:22:46PM +0200, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@...gle.com>
> 
> Commit 6f458dfb40 (tcp: improve latencies of timer triggered events)
> added bug leading to following trace :
> 
> [ 2866.131281] IPv4: Attempt to release TCP socket in state 1 ffff880019ec0000
> [ 2866.131726] 
> [ 2866.132188] =========================
> [ 2866.132281] [ BUG: held lock freed! ]
> [ 2866.132281] 3.6.0-rc1+ #622 Not tainted
> [ 2866.132281] -------------------------
> [ 2866.132281] kworker/0:1/652 is freeing memory ffff880019ec0000-ffff880019ec0a1f, with a lock still held there!
> [ 2866.132281]  (sk_lock-AF_INET-RPC){+.+...}, at: [<ffffffff81903619>] tcp_sendmsg+0x29/0xcc6
> [ 2866.132281] 4 locks held by kworker/0:1/652:
> [ 2866.132281]  #0:  (rpciod){.+.+.+}, at: [<ffffffff81083567>] process_one_work+0x1de/0x47f
> [ 2866.132281]  #1:  ((&task->u.tk_work)){+.+.+.}, at: [<ffffffff81083567>] process_one_work+0x1de/0x47f
> [ 2866.132281]  #2:  (sk_lock-AF_INET-RPC){+.+...}, at: [<ffffffff81903619>] tcp_sendmsg+0x29/0xcc6
> [ 2866.132281]  #3:  (&icsk->icsk_retransmit_timer){+.-...}, at: [<ffffffff81078017>] run_timer_softirq+0x1ad/0x35f
> [ 2866.132281] 
> [ 2866.132281] stack backtrace:
> [ 2866.132281] Pid: 652, comm: kworker/0:1 Not tainted 3.6.0-rc1+ #622
> [ 2866.132281] Call Trace:
> [ 2866.132281]  <IRQ>  [<ffffffff810bc527>] debug_check_no_locks_freed+0x112/0x159
> [ 2866.132281]  [<ffffffff818a0839>] ? __sk_free+0xfd/0x114
> [ 2866.132281]  [<ffffffff811549fa>] kmem_cache_free+0x6b/0x13a
> [ 2866.132281]  [<ffffffff818a0839>] __sk_free+0xfd/0x114
> [ 2866.132281]  [<ffffffff818a08c0>] sk_free+0x1c/0x1e
> [ 2866.132281]  [<ffffffff81911e1c>] tcp_write_timer+0x51/0x56
> [ 2866.132281]  [<ffffffff81078082>] run_timer_softirq+0x218/0x35f
> [ 2866.132281]  [<ffffffff81078017>] ? run_timer_softirq+0x1ad/0x35f
> [ 2866.132281]  [<ffffffff810f5831>] ? rb_commit+0x58/0x85
> [ 2866.132281]  [<ffffffff81911dcb>] ? tcp_write_timer_handler+0x148/0x148
> [ 2866.132281]  [<ffffffff81070bd6>] __do_softirq+0xcb/0x1f9
> [ 2866.132281]  [<ffffffff81a0a00c>] ? _raw_spin_unlock+0x29/0x2e
> [ 2866.132281]  [<ffffffff81a1227c>] call_softirq+0x1c/0x30
> [ 2866.132281]  [<ffffffff81039f38>] do_softirq+0x4a/0xa6
> [ 2866.132281]  [<ffffffff81070f2b>] irq_exit+0x51/0xad
> [ 2866.132281]  [<ffffffff81a129cd>] do_IRQ+0x9d/0xb4
> [ 2866.132281]  [<ffffffff81a0a3ef>] common_interrupt+0x6f/0x6f
> [ 2866.132281]  <EOI>  [<ffffffff8109d006>] ? sched_clock_cpu+0x58/0xd1
> [ 2866.132281]  [<ffffffff81a0a172>] ? _raw_spin_unlock_irqrestore+0x4c/0x56
> [ 2866.132281]  [<ffffffff81078692>] mod_timer+0x178/0x1a9
> [ 2866.132281]  [<ffffffff818a00aa>] sk_reset_timer+0x19/0x26
> [ 2866.132281]  [<ffffffff8190b2cc>] tcp_rearm_rto+0x99/0xa4
> [ 2866.132281]  [<ffffffff8190dfba>] tcp_event_new_data_sent+0x6e/0x70
> [ 2866.132281]  [<ffffffff8190f7ea>] tcp_write_xmit+0x7de/0x8e4
> [ 2866.132281]  [<ffffffff818a565d>] ? __alloc_skb+0xa0/0x1a1
> [ 2866.132281]  [<ffffffff8190f952>] __tcp_push_pending_frames+0x2e/0x8a
> [ 2866.132281]  [<ffffffff81904122>] tcp_sendmsg+0xb32/0xcc6
> [ 2866.132281]  [<ffffffff819229c2>] inet_sendmsg+0xaa/0xd5
> [ 2866.132281]  [<ffffffff81922918>] ? inet_autobind+0x5f/0x5f
> [ 2866.132281]  [<ffffffff810ee7f1>] ? trace_clock_local+0x9/0xb
> [ 2866.132281]  [<ffffffff8189adab>] sock_sendmsg+0xa3/0xc4
> [ 2866.132281]  [<ffffffff810f5de6>] ? rb_reserve_next_event+0x26f/0x2d5
> [ 2866.132281]  [<ffffffff8103e6a9>] ? native_sched_clock+0x29/0x6f
> [ 2866.132281]  [<ffffffff8103e6f8>] ? sched_clock+0x9/0xd
> [ 2866.132281]  [<ffffffff810ee7f1>] ? trace_clock_local+0x9/0xb
> [ 2866.132281]  [<ffffffff8189ae03>] kernel_sendmsg+0x37/0x43
> [ 2866.132281]  [<ffffffff8199ce49>] xs_send_kvec+0x77/0x80
> [ 2866.132281]  [<ffffffff8199cec1>] xs_sendpages+0x6f/0x1a0
> [ 2866.132281]  [<ffffffff8107826d>] ? try_to_del_timer_sync+0x55/0x61
> [ 2866.132281]  [<ffffffff8199d0d2>] xs_tcp_send_request+0x55/0xf1
> [ 2866.132281]  [<ffffffff8199bb90>] xprt_transmit+0x89/0x1db
> [ 2866.132281]  [<ffffffff81999bcd>] ? call_connect+0x3c/0x3c
> [ 2866.132281]  [<ffffffff81999d92>] call_transmit+0x1c5/0x20e
> [ 2866.132281]  [<ffffffff819a0d55>] __rpc_execute+0x6f/0x225
> [ 2866.132281]  [<ffffffff81999bcd>] ? call_connect+0x3c/0x3c
> [ 2866.132281]  [<ffffffff819a0f33>] rpc_async_schedule+0x28/0x34
> [ 2866.132281]  [<ffffffff810835d6>] process_one_work+0x24d/0x47f
> [ 2866.132281]  [<ffffffff81083567>] ? process_one_work+0x1de/0x47f
> [ 2866.132281]  [<ffffffff819a0f0b>] ? __rpc_execute+0x225/0x225
> [ 2866.132281]  [<ffffffff81083a6d>] worker_thread+0x236/0x317
> [ 2866.132281]  [<ffffffff81083837>] ? process_scheduled_works+0x2f/0x2f
> [ 2866.132281]  [<ffffffff8108b7b8>] kthread+0x9a/0xa2
> [ 2866.132281]  [<ffffffff81a12184>] kernel_thread_helper+0x4/0x10
> [ 2866.132281]  [<ffffffff81a0a4b0>] ? retint_restore_args+0x13/0x13
> [ 2866.132281]  [<ffffffff8108b71e>] ? __init_kthread_worker+0x5a/0x5a
> [ 2866.132281]  [<ffffffff81a12180>] ? gs_change+0x13/0x13
> [ 2866.308506] IPv4: Attempt to release TCP socket in state 1 ffff880019ec0000
> [ 2866.309689] =============================================================================
> [ 2866.310254] BUG TCP (Not tainted): Object already free
> [ 2866.310254] -----------------------------------------------------------------------------
> [ 2866.310254] 
> 
> The bug comes from the fact that timer set in sk_reset_timer() can run
> before we actually do the sock_hold(). socket refcount reaches zero and
> we free the socket too soon.
> 
> timer handler is not allowed to reduce socket refcnt if socket is owned
> by the user, or we need to change sk_reset_timer() implementation.
> 
> We should take a reference on the socket in case TCP_DELACK_TIMER_DEFERRED
> or TCP_DELACK_TIMER_DEFERRED bit are set in tsq_flags
> 
> Also fix a typo in tcp_delack_timer(), where TCP_WRITE_TIMER_DEFERRED
> was used instead of TCP_DELACK_TIMER_DEFERRED.
> 
> For consistency, use same socket refcount change for TCP_MTU_REDUCED_DEFERRED,
> even if not fired from a timer.
> 
> Reported-by: Fengguang Wu <fengguang.wu@...el.com>
> Tested-by: Fengguang Wu <fengguang.wu@...el.com>
> Signed-off-by: Eric Dumazet <edumazet@...gle.com>
> ---
>  net/ipv4/tcp_ipv4.c   |    8 +++++---
>  net/ipv4/tcp_output.c |   14 +++++++++-----
>  net/ipv4/tcp_timer.c  |    6 ++++--
>  3 files changed, 18 insertions(+), 10 deletions(-)
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index 7678237..6278a11 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -417,10 +417,12 @@ void tcp_v4_err(struct sk_buff *icmp_skb, u32 info)
>  
>  		if (code == ICMP_FRAG_NEEDED) { /* PMTU discovery (RFC1191) */
>  			tp->mtu_info = info;
> -			if (!sock_owned_by_user(sk))
> +			if (!sock_owned_by_user(sk)) {
>  				tcp_v4_mtu_reduced(sk);
> -			else
> -				set_bit(TCP_MTU_REDUCED_DEFERRED, &tp->tsq_flags);
> +			} else {
> +				if (!test_and_set_bit(TCP_MTU_REDUCED_DEFERRED, &tp->tsq_flags))
> +					sock_hold(sk);
> +			}
>  			goto out;
>  		}
>  
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 20dfd89..d046326 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -910,14 +910,18 @@ void tcp_release_cb(struct sock *sk)
>  	if (flags & (1UL << TCP_TSQ_DEFERRED))
>  		tcp_tsq_handler(sk);
>  
> -	if (flags & (1UL << TCP_WRITE_TIMER_DEFERRED))
> +	if (flags & (1UL << TCP_WRITE_TIMER_DEFERRED)) {
>  		tcp_write_timer_handler(sk);
> -
> -	if (flags & (1UL << TCP_DELACK_TIMER_DEFERRED))
> +		__sock_put(sk);
> +	}
> +	if (flags & (1UL << TCP_DELACK_TIMER_DEFERRED)) {
>  		tcp_delack_timer_handler(sk);
> -
> -	if (flags & (1UL << TCP_MTU_REDUCED_DEFERRED))
> +		__sock_put(sk);
> +	}
> +	if (flags & (1UL << TCP_MTU_REDUCED_DEFERRED)) {
>  		sk->sk_prot->mtu_reduced(sk);
> +		__sock_put(sk);
> +	}
>  }
>  EXPORT_SYMBOL(tcp_release_cb);
>  
> diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
> index 6df36ad..b774a03 100644
> --- a/net/ipv4/tcp_timer.c
> +++ b/net/ipv4/tcp_timer.c
> @@ -252,7 +252,8 @@ static void tcp_delack_timer(unsigned long data)
>  		inet_csk(sk)->icsk_ack.blocked = 1;
>  		NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_DELAYEDACKLOCKED);
>  		/* deleguate our work to tcp_release_cb() */
> -		set_bit(TCP_WRITE_TIMER_DEFERRED, &tcp_sk(sk)->tsq_flags);
> +		if (!test_and_set_bit(TCP_DELACK_TIMER_DEFERRED, &tcp_sk(sk)->tsq_flags))
> +			sock_hold(sk);
>  	}
>  	bh_unlock_sock(sk);
>  	sock_put(sk);
> @@ -481,7 +482,8 @@ static void tcp_write_timer(unsigned long data)
>  		tcp_write_timer_handler(sk);
>  	} else {
>  		/* deleguate our work to tcp_release_cb() */
> -		set_bit(TCP_WRITE_TIMER_DEFERRED, &tcp_sk(sk)->tsq_flags);
> +		if (!test_and_set_bit(TCP_WRITE_TIMER_DEFERRED, &tcp_sk(sk)->tsq_flags))
> +			sock_hold(sk);
>  	}
>  	bh_unlock_sock(sk);
>  	sock_put(sk);
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ