linux-kernel - Re: [PATCH v3] net: ax25: Fix deadlock caused by skb_recv_datagram in ax25

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4ccdba76.5ee33.181489cd6e4.Coremail.duoming@zju.edu.cn>
Date:   Thu, 9 Jun 2022 21:17:27 +0800 (GMT+08:00)
From:   duoming@....edu.cn
To:     "Paolo Abeni" <pabeni@...hat.com>
Cc:     linux-kernel@...r.kernel.org, jreuter@...na.de,
        ralf@...ux-mips.org, davem@...emloft.net, edumazet@...gle.com,
        kuba@...nel.org, netdev@...r.kernel.org,
        linux-hams@...r.kernel.org, thomas@...erried.de
Subject: Re: [PATCH v3] net: ax25: Fix deadlock caused by skb_recv_datagram
 in ax25_recvmsg

Hello,

On Thu, 09 Jun 2022 10:41:02 +0200 Paolo wrote:

> On Wed, 2022-06-08 at 09:29 +0800, Duoming Zhou wrote:
> > The skb_recv_datagram() in ax25_recvmsg() will hold lock_sock
> > and block until it receives a packet from the remote. If the client
> > doesn`t connect to server and calls read() directly, it will not
> > receive any packets forever. As a result, the deadlock will happen.
> > 
> > The fail log caused by deadlock is shown below:
> > 
> > [  369.606973] INFO: task ax25_deadlock:157 blocked for more than 245 seconds.
> > [  369.608919] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [  369.613058] Call Trace:
> > [  369.613315]  <TASK>
> > [  369.614072]  __schedule+0x2f9/0xb20
> > [  369.615029]  schedule+0x49/0xb0
> > [  369.615734]  __lock_sock+0x92/0x100
> > [  369.616763]  ? destroy_sched_domains_rcu+0x20/0x20
> > [  369.617941]  lock_sock_nested+0x6e/0x70
> > [  369.618809]  ax25_bind+0xaa/0x210
> > [  369.619736]  __sys_bind+0xca/0xf0
> > [  369.620039]  ? do_futex+0xae/0x1b0
> > [  369.620387]  ? __x64_sys_futex+0x7c/0x1c0
> > [  369.620601]  ? fpregs_assert_state_consistent+0x19/0x40
> > [  369.620613]  __x64_sys_bind+0x11/0x20
> > [  369.621791]  do_syscall_64+0x3b/0x90
> > [  369.622423]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
> > [  369.623319] RIP: 0033:0x7f43c8aa8af7
> > [  369.624301] RSP: 002b:00007f43c8197ef8 EFLAGS: 00000246 ORIG_RAX: 0000000000000031
> > [  369.625756] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f43c8aa8af7
> > [  369.626724] RDX: 0000000000000010 RSI: 000055768e2021d0 RDI: 0000000000000005
> > [  369.628569] RBP: 00007f43c8197f00 R08: 0000000000000011 R09: 00007f43c8198700
> > [  369.630208] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff845e6afe
> > [  369.632240] R13: 00007fff845e6aff R14: 00007f43c8197fc0 R15: 00007f43c8198700
> > 
> > This patch moves the skb_recv_datagram() before lock_sock() in order that
> > other functions that need lock_sock could be executed. What`s more, we
> > add skb_free_datagram() before goto out in order to mitigate memory leak.
> > 
> > Suggested-by: Thomas Osterried <thomas@...erried.de>
> > Signed-off-by: Duoming Zhou <duoming@....edu.cn>
> > Reported-by: Thomas Habets <thomas@@habets.se>
> > ---
> > Changes in v3:
> >   - Add skb_free_datagram() before goto out in order to mitigate memory leak.
> > 
> >  net/ax25/af_ax25.c | 12 +++++++-----
> >  1 file changed, 7 insertions(+), 5 deletions(-)
> > 
> > diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
> > index 95393bb2760..62aa5993093 100644
> > --- a/net/ax25/af_ax25.c
> > +++ b/net/ax25/af_ax25.c
> > @@ -1665,6 +1665,11 @@ static int ax25_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
> >  	int copied;
> >  	int err = 0;
> >  
> > +	/* Now we can treat all alike */
> > +	skb = skb_recv_datagram(sk, flags, &err);
> > +	if (!skb)
> > +		goto done;
> > +
> 
> Note that this causes a behavior change: before this patch, calling
> recvmsg() on unconnected seqpacket sockets returned immediatelly with
> an error (due to the the check below), now it blocks. 
> 
> The change may confuse (== break) user-space applications. I think it
> would be better replacing skb_recv_datagram with an open-coded variant
> of it releasing the socket lock before the
> __skb_wait_for_more_packets() call and re-acquiring it after such call.
> Somewhat alike __unix_dgram_recvmsg().

Thank you for your time and suggestions!
I think the following method may solve the problem.

diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
index 95393bb2760..51b441c837c 100644
--- a/net/ax25/af_ax25.c
+++ b/net/ax25/af_ax25.c
@@ -1675,8 +1675,10 @@ static int ax25_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
                goto out;
        }

+       release_sock(sk);
        /* Now we can treat all alike */
        skb = skb_recv_datagram(sk, flags, &err);
+       lock_sock(sk);
        if (skb == NULL)
                goto out;

The skb_recv_datagram() is free of race conditions and could be re-entrant.
So calling skb_recv_datagram() without the protection of lock_sock() is ok.

What's more, releasing the lock_sock() before skb_recv_datagram() will not
cause UAF bugs. Because the sock will not be deallocated unless we call
ax25_release(), but ax25_release() and ax25_recvmsg() could not run in parallel.

Although the "sk->sk_state" may be changed due to the release of lock_sock(),
it will not influence the following operations in ax25_recvmsg().

> In any case this lacks a 'Fixes' pointing to the commit introducing the
> issue addressed here.

The commit that need to be fixed is below:
Fixes: 40d0a923f55a ("Implement locking of internal data for NET/ROM and ROSE. ")
in linux-2.6.12-rc2.

Best regards,
Duoming Zhou