netdev - RE: [BUG] Deadlock on sk->sk

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <771471B8871B5044A6CA7CCD9C26EEE10117E265C985@xmail2.se.axis.com>
Date:	Wed, 5 Mar 2014 16:01:38 +0100
From:	Lars Persson <lars.persson@...s.com>
To:	Eric Dumazet <eric.dumazet@...il.com>
CC:	"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: RE: [BUG] Deadlock on sk->sk_lock.slock


Thanks, this patch has survived a 24 hour test session. I will now start a test with the new sock_release_ownership patch.

- Lars

> -----Original Message-----
> From: Eric Dumazet [mailto:eric.dumazet@...il.com]
> Sent: den 4 mars 2014 15:16
> To: Lars Persson
> Cc: netdev@...r.kernel.org
> Subject: Re: [BUG] Deadlock on sk->sk_lock.slock
> 
> On Tue, 2014-03-04 at 09:59 +0100, Lars Persson wrote:
> > Hi
> >
> > We can trigger an rcu stall by stressing the error handling in the
> CIFS file system. The test case induces network outages of varying
> length while data is written to the file system. Judging by the stack
> dump it might be a deadlock in the tcpv4 code. So far seen on stable
> 3.10.32.
> >
> > My initial interpretation of the stack trace:
> > At frame 9 bh is enabled while holding the socket spinlock. This
> happens from __tcp_checksum_complete_user, which is inlined in
> tcp_rcv_established.
> >
> 
> 
> Thanks a lot for this report.
> 
> 
> > At frame 1 we try to grab the same lock from bh context with
> resulting deadlock.
> >
> > -000 |M:0x0:0x802B6AF8(asm) <-- arch_spin_lock
> > -001 |tcp_v4_rcv(skb = 0x8BD527A0) <-- sk = 0x8BE6B2A0
> > -002 |ip_local_deliver_finish(skb = 0x8BD527A0)
> > -003 |__netif_receive_skb_core(skb = 0x8BD527A0, ?)
> > -004 |netif_receive_skb(skb = 0x8BD527A0)
> > -005 |elk_poll(napi = 0x8C770500, budget = 64)
> > -006 |net_rx_action(?)
> > -007 |__do_softirq()
> > -008 |do_softirq()
> > -009 |local_bh_enable()
> > -010 |tcp_rcv_established(sk = 0x8BE6B2A0, skb = 0x87D3A9E0, th =
> > 0x814EBE14, ?)
> > -011 |tcp_v4_do_rcv(sk = 0x8BE6B2A0, skb = 0x87D3A9E0)
> > -012 |tcp_delack_timer_handler(sk = 0x8BE6B2A0)
> > -013 |tcp_release_cb(sk = 0x8BE6B2A0)
> > -014 |release_sock(sk = 0x8BE6B2A0)
> > -015 |tcp_sendmsg(?, sk = 0x8BE6B2A0, ?, ?)
> > -016 |sock_sendmsg(sock = 0x8518C4C0, msg = 0x87D8DAA8, size = 4096)
> > -017 |kernel_sendmsg(?, ?, ?, ?, size = 4096)
> > -018 |smb_send_kvec(server = 0x87C4D400, iov = 0x87D8DB10, n_vec = 1,
> > sent = 0x87D8DB1C)
> > -019 |smb_send_rqst(server = 0x87C4D400, rqst = 0x87D8DBA0)
> > -020 |cifs_call_async(server = 0x87C4D400, rqst = 0x87D8DBA0, receive
> > = 0x0, callback = 0x80219514,
> > -021 |cifs_async_writev(wdata = 0x87FD6580)
> > -022 |cifs_writepages(mapping = 0x852096E4, wbc = 0x87D8DC88)
> > -023 |__writeback_single_inode(inode = 0x852095D0, wbc = 0x87D8DC88)
> > -024 |writeback_sb_inodes(sb = 0x87D6D800, wb = 0x87E4A9C0, work =
> > 0x87D8DD88)
> > -025 |__writeback_inodes_wb(wb = 0x87E4A9C0, work = 0x87D8DD88)
> > -026 |wb_writeback(wb = 0x87E4A9C0, work = 0x87D8DD88)
> > -027 |wb_do_writeback(wb = 0x87E4A9C0, force_wait = 0)
> > -028 |bdi_writeback_workfn(work = 0x87E4A9CC)
> > -029 |process_one_work(worker = 0x8B045880, work = 0x87E4A9CC)
> > -030 |worker_thread(__worker = 0x8B045880)
> > -031 |kthread(_create = 0x87CADD90)
> > -032 |ret_from_kernel_thread(asm)
> 
> So the callchain is
> 
> release_sock()
> {
>  spin_lock_bh(&sk->sk_lock.slock);
> 
>  sk->sk_prot->release_cb(sk); //
> 
>  sk->sk_lock.owned = 0;
> }
> 
> Meaning we have the socket spinlock locked by us, and we disabled BH.
> 
> Problem is tcp_delack_timer_handler() assumes it was called from timer
> handler and sock was not owned by user.
> 
> Bug was added in commit 6f458dfb409272082c9bfa412f77ff2fc21c626f
> ("tcp: improve latencies of timer triggered events")
> 
> and triggers if NIC lacks RX checksum support.
> 
> 
> Could you try following hack/patch ?
> 
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index
> 6e4809389cbf..ed17e1cacae2 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -4945,7 +4945,10 @@ static __sum16
> __tcp_checksum_complete_user(struct sock *sk,  {
>  	__sum16 result;
> 
> -	if (sock_owned_by_user(sk)) {
> +	/* If we are called from tcp_release_cb(),
> +	 * we do not want to enable BH
> +	 */
> +	if (sk->sk_lock.owned == 1) {
>  		local_bh_enable();
>  		result = __tcp_checksum_complete(skb);
>  		local_bh_disable();
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index
> aaa68f5b1055..588837d06c60 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -785,6 +785,8 @@ void tcp_release_cb(struct sock *sk)
>  		__sock_put(sk);
>  	}
>  	if (flags & (1UL << TCP_DELACK_TIMER_DEFERRED)) {
> +		/* Make sure __tcp_checksum_complete_user() wont enable BH */
> +		sk->sk_lock.owned = 2;
>  		tcp_delack_timer_handler(sk);
>  		__sock_put(sk);
>  	}
>