linux-kernel - Re: [syzbot] [net?] general protection fault in add_wait

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <6fonjxxkozzmv7huzavck5nsfivx3nsyyicthulg5aiyrmjpql@o7pexllumdxt>
Date: Tue, 4 Feb 2025 10:59:55 +0100
From: Stefano Garzarella <sgarzare@...hat.com>
To: Michal Luczaj <mhal@...x.co>
Cc: syzbot <syzbot+9d55b199192a4be7d02c@...kaller.appspotmail.com>, 
	davem@...emloft.net, edumazet@...gle.com, eperezma@...hat.com, horms@...nel.org, 
	jasowang@...hat.com, kuba@...nel.org, kvm@...r.kernel.org, 
	linux-kernel@...r.kernel.org, mst@...hat.com, netdev@...r.kernel.org, pabeni@...hat.com, 
	stefanha@...hat.com, syzkaller-bugs@...glegroups.com, virtualization@...ts.linux.dev, 
	xuanzhuo@...ux.alibaba.com
Subject: Re: [syzbot] [net?] general protection fault in add_wait_queue

On Tue, Feb 04, 2025 at 01:38:50AM +0100, Michal Luczaj wrote:
>On 2/3/25 10:57, syzbot wrote:
>> Hello,
>>
>> syzbot found the following issue on:
>>
>> HEAD commit:    c2933b2befe2 Merge tag 'net-6.14-rc1' of git://git.kernel...
>> git tree:       net-next
>> console output: https://syzkaller.appspot.com/x/log.txt?x=16f676b0580000
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=d033b14aeef39158
>> dashboard link: https://syzkaller.appspot.com/bug?extid=9d55b199192a4be7d02c
>> compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
>> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=13300b24580000
>> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=12418518580000
>>
>> Downloadable assets:
>> disk image: https://storage.googleapis.com/syzbot-assets/c7667ae12603/disk-c2933b2b.raw.xz
>> vmlinux: https://storage.googleapis.com/syzbot-assets/944ca63002c1/vmlinux-c2933b2b.xz
>> kernel image: https://storage.googleapis.com/syzbot-assets/30748115bf0b/bzImage-c2933b2b.xz
>>
>> The issue was bisected to:
>>
>> commit fcdd2242c0231032fc84e1404315c245ae56322a
>> Author: Michal Luczaj <mhal@...x.co>
>> Date:   Tue Jan 28 13:15:27 2025 +0000
>>
>>     vsock: Keep the binding until socket destruction
>
>syzbot is correct (thanks), bisected commit introduced a regression.
>
>sock_orphan(sk) is being called without taking into consideration that it
>does `sk->sk_wq = NULL`. Later, if SO_LINGER is set, sk->sk_wq gets
>dereferenced in virtio_transport_wait_close().
>
>Repro, as shown by syzbot, is simply
>from socket import *
>lis = socket(AF_VSOCK, SOCK_STREAM)
>lis.bind((1, 1234)) # VMADDR_CID_LOCAL
>lis.listen()
>s = socket(AF_VSOCK, SOCK_STREAM)
>s.setsockopt(SOL_SOCKET, SO_LINGER, (1<<32) | 1)
>s.connect(lis.getsockname())
>s.close()
>
>A way of fixing this is to put sock_orphan(sk) back where it was before the
>breaking patch and instead explicitly flip just the SOCK_DEAD bit, i.e.
>
>diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>index 075695173648..06250bb9afe2 100644
>--- a/net/vmw_vsock/af_vsock.c
>+++ b/net/vmw_vsock/af_vsock.c
>@@ -824,13 +824,14 @@ static void __vsock_release(struct sock *sk, int level)
> 	 */
> 	lock_sock_nested(sk, level);
>
>-	sock_orphan(sk);
>+	sock_set_flag(sk, SOCK_DEAD);
>
> 	if (vsk->transport)
> 		vsk->transport->release(vsk);
> 	else if (sock_type_connectible(sk->sk_type))
> 		vsock_remove_sock(vsk);
>
>+	sock_orphan(sk);
> 	sk->sk_shutdown = SHUTDOWN_MASK;
>
> 	skb_queue_purge(&sk->sk_receive_queue);
>
>I'm not sure this is the most elegant code (sock_orphan(sk) sets SOCK_DEAD
>on a socket that is already SOCK_DEAD), but here it goes:
>https://lore.kernel.org/netdev/20250204-vsock-linger-nullderef-v1-0-6eb1760fa93e@rbox.co/

What about the fix proposed here:
https://lore.kernel.org/lkml/20250203124959.114591-1-aha310510@gmail.com/

>
>One more note: man socket(7) says lingering also happens on shutdown().
>Should vsock follow that?

Good point, I think so.
IMHO we should handle both of them in af_vsock.c if it's possible, but 
maybe we need a bit of refactoring.

Anyway, net-next material, right?

Stefano