linux-kernel - Re: [PATCH net-next] ax25: Fix deadlock caused by skb_recv_datagram in ax25

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <17d6464d.57350.1813e1c1082.Coremail.duoming@zju.edu.cn>
Date:   Tue, 7 Jun 2022 20:20:35 +0800 (GMT+08:00)
From:   duoming@....edu.cn
To:     "Eric Dumazet" <edumazet@...gle.com>
Cc:     LKML <linux-kernel@...r.kernel.org>, jreuter@...na.de,
        "Ralf Baechle" <ralf@...ux-mips.org>,
        "David Miller" <davem@...emloft.net>,
        "Jakub Kicinski" <kuba@...nel.org>,
        "Paolo Abeni" <pabeni@...hat.com>, netdev <netdev@...r.kernel.org>,
        linux-hams@...r.kernel.org, thomas@...erried.de
Subject: Re: [PATCH net-next] ax25: Fix deadlock caused by skb_recv_datagram
 in ax25_recvmsg

Hello,

On Mon, 6 Jun 2022 10:31:49 -0700 Eric Dumazet wrote:

> On Mon, Jun 6, 2022 at 9:21 AM Duoming Zhou <duoming@....edu.cn> wrote:
> >
> > The skb_recv_datagram() in ax25_recvmsg() will hold lock_sock
> > and block until it receives a packet from the remote. If the client
> > doesn`t connect to server and calls read() directly, it will not
> > receive any packets forever. As a result, the deadlock will happen.
> >
> > The fail log caused by deadlock is shown below:
> >
> > [  861.122612] INFO: task ax25_deadlock:148 blocked for more than 737 seconds.
> > [  861.124543] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [  861.127764] Call Trace:
> > [  861.129688]  <TASK>
> > [  861.130743]  __schedule+0x2f9/0xb20
> > [  861.131526]  schedule+0x49/0xb0
> > [  861.131640]  __lock_sock+0x92/0x100
> > [  861.131640]  ? destroy_sched_domains_rcu+0x20/0x20
> > [  861.131640]  lock_sock_nested+0x6e/0x70
> > [  861.131640]  ax25_sendmsg+0x46/0x420
> > [  861.134383]  ? ax25_recvmsg+0x1e0/0x1e0
> > [  861.135658]  sock_sendmsg+0x59/0x60
> > [  861.136791]  __sys_sendto+0xe9/0x150
> > [  861.137212]  ? __schedule+0x301/0xb20
> > [  861.137710]  ? __do_softirq+0x4a2/0x4fd
> > [  861.139153]  __x64_sys_sendto+0x20/0x30
> > [  861.140330]  do_syscall_64+0x3b/0x90
> > [  861.140731]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
> > [  861.141249] RIP: 0033:0x7fdf05ee4f64
> > [  861.141249] RSP: 002b:00007ffe95772fc0 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
> > [  861.141249] RAX: ffffffffffffffda RBX: 0000565303a013f0 RCX: 00007fdf05ee4f64
> > [  861.141249] RDX: 0000000000000005 RSI: 0000565303a01678 RDI: 0000000000000005
> > [  861.141249] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> > [  861.141249] R10: 0000000000000000 R11: 0000000000000246 R12: 0000565303a00cf0
> > [  861.141249] R13: 00007ffe957730e0 R14: 0000000000000000 R15: 0000000000000000
> >
> > This patch moves the skb_recv_datagram() before lock_sock() in order
> > that other functions that need lock_sock could be executed.
> >
> 
> 
> Why is this targeting net-next tree ?
> 
> 1) A fix should target net tree
> 2) It should include a Fixes: tag

Thank you for your time and suggestions!
I will change the target tree to net and add a Fixes: tag.

> Also:
> - this patch bypasses tests in ax25_recvmsg()
> - This might break applications depending on blocking read() operations.
> 
> I feel a real fix is going to be slightly more difficult than that.

I think moving skb_recv_datagram() before lock_sock() is ok, because it does not
hold lock_sock() and will not influence other operations. The applications would not
break. What`s more, it is safe to move skb_recv_datagram() before lock_sock().

The check "if (sk->sk_type == SOCK_SEQPACKET && sk->sk_state != TCP_ESTABLISHED)"
have to be protected by lock_sock(), because sk->sk_state may be changed by
ax25_disconnect() in ax25_kill_by_device(). 

Best regards,
Duoming Zhou