lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Fri, 10 Jun 2022 00:10:48 +0800 (GMT+08:00) From: duoming@....edu.cn To: "Paolo Abeni" <pabeni@...hat.com> Cc: linux-kernel@...r.kernel.org, jreuter@...na.de, ralf@...ux-mips.org, davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org, netdev@...r.kernel.org, linux-hams@...r.kernel.org, thomas@...erried.de Subject: Re: [PATCH v3] net: ax25: Fix deadlock caused by skb_recv_datagram in ax25_recvmsg Hello, On Thu, 09 Jun 2022 15:33:02 +0200 Paolo wrote: > > > > The skb_recv_datagram() in ax25_recvmsg() will hold lock_sock > > > > and block until it receives a packet from the remote. If the client > > > > doesn`t connect to server and calls read() directly, it will not > > > > receive any packets forever. As a result, the deadlock will happen. > > > > > > > > The fail log caused by deadlock is shown below: > > > > > > > > [ 369.606973] INFO: task ax25_deadlock:157 blocked for more than 245 seconds. > > > > [ 369.608919] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > > > [ 369.613058] Call Trace: > > > > [ 369.613315] <TASK> > > > > [ 369.614072] __schedule+0x2f9/0xb20 > > > > [ 369.615029] schedule+0x49/0xb0 > > > > [ 369.615734] __lock_sock+0x92/0x100 > > > > [ 369.616763] ? destroy_sched_domains_rcu+0x20/0x20 > > > > [ 369.617941] lock_sock_nested+0x6e/0x70 > > > > [ 369.618809] ax25_bind+0xaa/0x210 > > > > [ 369.619736] __sys_bind+0xca/0xf0 > > > > [ 369.620039] ? do_futex+0xae/0x1b0 > > > > [ 369.620387] ? __x64_sys_futex+0x7c/0x1c0 > > > > [ 369.620601] ? fpregs_assert_state_consistent+0x19/0x40 > > > > [ 369.620613] __x64_sys_bind+0x11/0x20 > > > > [ 369.621791] do_syscall_64+0x3b/0x90 > > > > [ 369.622423] entry_SYSCALL_64_after_hwframe+0x46/0xb0 > > > > [ 369.623319] RIP: 0033:0x7f43c8aa8af7 > > > > [ 369.624301] RSP: 002b:00007f43c8197ef8 EFLAGS: 00000246 ORIG_RAX: 0000000000000031 > > > > [ 369.625756] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f43c8aa8af7 > > > > [ 369.626724] RDX: 0000000000000010 RSI: 000055768e2021d0 RDI: 0000000000000005 > > > > [ 369.628569] RBP: 00007f43c8197f00 R08: 0000000000000011 R09: 00007f43c8198700 > > > > [ 369.630208] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff845e6afe > > > > [ 369.632240] R13: 00007fff845e6aff R14: 00007f43c8197fc0 R15: 00007f43c8198700 > > > > > > > > This patch moves the skb_recv_datagram() before lock_sock() in order that > > > > other functions that need lock_sock could be executed. What`s more, we > > > > add skb_free_datagram() before goto out in order to mitigate memory leak. > > > > > > > > Suggested-by: Thomas Osterried <thomas@...erried.de> > > > > Signed-off-by: Duoming Zhou <duoming@....edu.cn> > > > > Reported-by: Thomas Habets <thomas@@habets.se> > > > > --- > > > > Changes in v3: > > > > - Add skb_free_datagram() before goto out in order to mitigate memory leak. > > > > > > > > net/ax25/af_ax25.c | 12 +++++++----- > > > > 1 file changed, 7 insertions(+), 5 deletions(-) > > > > > > > > diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c > > > > index 95393bb2760..62aa5993093 100644 > > > > --- a/net/ax25/af_ax25.c > > > > +++ b/net/ax25/af_ax25.c > > > > @@ -1665,6 +1665,11 @@ static int ax25_recvmsg(struct socket *sock, struct msghdr *msg, size_t size, > > > > int copied; > > > > int err = 0; > > > > > > > > + /* Now we can treat all alike */ > > > > + skb = skb_recv_datagram(sk, flags, &err); > > > > + if (!skb) > > > > + goto done; > > > > + > > > > > > Note that this causes a behavior change: before this patch, calling > > > recvmsg() on unconnected seqpacket sockets returned immediatelly with > > > an error (due to the the check below), now it blocks. > > > > > > The change may confuse (== break) user-space applications. I think it > > > would be better replacing skb_recv_datagram with an open-coded variant > > > of it releasing the socket lock before the > > > __skb_wait_for_more_packets() call and re-acquiring it after such call. > > > Somewhat alike __unix_dgram_recvmsg(). > > > > Thank you for your time and suggestions! > > I think the following method may solve the problem. > > > > diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c > > index 95393bb2760..51b441c837c 100644 > > --- a/net/ax25/af_ax25.c > > +++ b/net/ax25/af_ax25.c > > @@ -1675,8 +1675,10 @@ static int ax25_recvmsg(struct socket *sock, struct msghdr *msg, size_t size, > > goto out; > > } > > > > + release_sock(sk); > > /* Now we can treat all alike */ > > skb = skb_recv_datagram(sk, flags, &err); > > + lock_sock(sk); > > if (skb == NULL) > > goto out; > > > > The skb_recv_datagram() is free of race conditions and could be re-entrant. > > So calling skb_recv_datagram() without the protection of lock_sock() is ok. > > > > What's more, releasing the lock_sock() before skb_recv_datagram() will not > > cause UAF bugs. Because the sock will not be deallocated unless we call > > ax25_release(), but ax25_release() and ax25_recvmsg() could not run in parallel. > > > > Although the "sk->sk_state" may be changed due to the release of lock_sock(), > > it will not influence the following operations in ax25_recvmsg(). > > One of the downside of the above is that recvmsg() will unconditionally > acquire and release the socket lock twice which can have non > trivial/nasty side effects on process scheduling. > > With the suggested change the socket lock will be released only when > recvmsg will block and that should produce nicer overal behavior. I test the following method, it runs well. diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c index 95393bb2760..2888aee91a5 100644 --- a/net/ax25/af_ax25.c +++ b/net/ax25/af_ax25.c @@ -1661,9 +1661,12 @@ static int ax25_recvmsg(struct socket *sock, struct msghdr *msg, size_t size, int flags) { struct sock *sk = sock->sk; - struct sk_buff *skb; + struct sk_buff *skb, *last; + struct sk_buff_head *sk_queue; int copied; int err = 0; + bool is_block = false; + long timeo; lock_sock(sk); /* @@ -1676,9 +1679,29 @@ static int ax25_recvmsg(struct socket *sock, struct msghdr *msg, size_t size, } /* Now we can treat all alike */ - skb = skb_recv_datagram(sk, flags, &err); - if (skb == NULL) - goto out; + sk_queue = &sk->sk_receive_queue; + timeo = sock_rcvtimeo(sk, flags & MSG_DONTWAIT); + skb = __skb_try_recv_datagram(sk, sk_queue, flags, 0, &err, + &last); + if (!skb && (err == -EAGAIN)) { + is_block = true; + release_sock(sk); + do { + skb = __skb_try_recv_datagram(sk, sk_queue, flags, 0, &err, + &last); + if (skb) + break; + + if (err != -EAGAIN) + goto done; + } while (timeo && + !__skb_wait_for_more_packets(sk, sk_queue, &err, + &timeo, last)); + if(!skb) + goto done; + } + if (is_block) + lock_sock(sk); if (!sk_to_ax25(sk)->pidincl) skb_pull(skb, 1); /* Remove PID */ @@ -1725,6 +1748,7 @@ static int ax25_recvmsg(struct socket *sock, struct msghdr *msg, size_t size, out: release_sock(sk); +done: return err; } I think this method could solve the problem. Welcome more advice. Best regards, Duoming Zhou
Powered by blists - more mailing lists