lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 25 Feb 2020 05:44:03 +0000
From:   Dexuan Cui <decui@...rosoft.com>
To:     Stefano Garzarella <sgarzare@...hat.com>,
        Hillf Danton <hdanton@...a.com>,
        Jorgen Hansen <jhansen@...are.com>,
        Stefan Hajnoczi <stefanha@...hat.com>
CC:     syzbot <syzbot+731710996d79d0d58fbc@...kaller.appspotmail.com>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "kuba@...nel.org" <kuba@...nel.org>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "syzkaller-bugs@...glegroups.com" <syzkaller-bugs@...glegroups.com>,
        "virtualization@...ts.linux-foundation.org" 
        <virtualization@...ts.linux-foundation.org>
Subject: RE: INFO: task hung in lock_sock_nested (2)

> From: Stefano Garzarella <sgarzare@...hat.com>
> Sent: Monday, February 24, 2020 2:09 AM
> ...
> > > syz-executor280 D27912  9768   9766 0x00000000
> > > Call Trace:
> > >  context_switch kernel/sched/core.c:3386 [inline]
> > >  __schedule+0x934/0x1f90 kernel/sched/core.c:4082
> > >  schedule+0xdc/0x2b0 kernel/sched/core.c:4156
> > >  __lock_sock+0x165/0x290 net/core/sock.c:2413
> > >  lock_sock_nested+0xfe/0x120 net/core/sock.c:2938
> > >  virtio_transport_release+0xc4/0xd60
> net/vmw_vsock/virtio_transport_common.c:832
> > >  vsock_assign_transport+0xf3/0x3b0 net/vmw_vsock/af_vsock.c:454
> > >  vsock_stream_connect+0x2b3/0xc70 net/vmw_vsock/af_vsock.c:1288
> > >  __sys_connect_file+0x161/0x1c0 net/socket.c:1857
> > >  __sys_connect+0x174/0x1b0 net/socket.c:1874
> > >  __do_sys_connect net/socket.c:1885 [inline]
> > >  __se_sys_connect net/socket.c:1882 [inline]
> > >  __x64_sys_connect+0x73/0xb0 net/socket.c:1882
> > >  do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294

My understanding about the call trace is: in vsock_stream_connect() 
after we call lock_sock(sk), we call vsock_assign_transport(), which may
call vsk->transport->release(vsk), i.e. virtio_transport_release(), and in
virtio_transport_release() we try to get the same lock and hang.

> > Seems like vsock needs a word to track lock owner in an attempt to
> > avoid trying to lock sock while the current is the lock owner.

I'm unfamilar with the g2h/h2g :-) 
Here I'm wondering if it's acceptable to add an 'already_locked'
parameter like this:
  bool already_locked = true;
  vsk->transport->release(vsk, already_locked) ?
 
> Thanks for this possible solution.
> What about using sock_owned_by_user()?
 
> We should fix also hyperv_transport, because it could suffer from the same
> problem.

IIUC hyperv_transport doesn't supprot the h2g/g2h feature, so it should not
suffers from the deadlock issue here?

> At this point, it might be better to call vsk->transport->release(vsk)
> always with the lock taken and remove it in the transports as in the
> following patch.
> 
> What do you think?
> 
> 
> diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
> index 9c5b2a91baad..a073d8efca33 100644
> --- a/net/vmw_vsock/af_vsock.c
> +++ b/net/vmw_vsock/af_vsock.c
> @@ -753,20 +753,18 @@ static void __vsock_release(struct sock *sk, int
> level)
>  		vsk = vsock_sk(sk);
>  		pending = NULL;	/* Compiler warning. */
> 
> -		/* The release call is supposed to use lock_sock_nested()
> -		 * rather than lock_sock(), if a sock lock should be acquired.
> -		 */
> -		if (vsk->transport)
> -			vsk->transport->release(vsk);
> -		else if (sk->sk_type == SOCK_STREAM)
> -			vsock_remove_sock(vsk);
> -
>  		/* When "level" is SINGLE_DEPTH_NESTING, use the nested
>  		 * version to avoid the warning "possible recursive locking
>  		 * detected". When "level" is 0, lock_sock_nested(sk, level)
>  		 * is the same as lock_sock(sk).
>  		 */
>  		lock_sock_nested(sk, level);
> +
> +		if (vsk->transport)
> +			vsk->transport->release(vsk);
> +		else if (sk->sk_type == SOCK_STREAM)
> +			vsock_remove_sock(vsk);
> +
>  		sock_orphan(sk);
>  		sk->sk_shutdown = SHUTDOWN_MASK;
> 
> diff --git a/net/vmw_vsock/hyperv_transport.c
> b/net/vmw_vsock/hyperv_transport.c
> index 3492c021925f..510f25f4a856 100644
> --- a/net/vmw_vsock/hyperv_transport.c
> +++ b/net/vmw_vsock/hyperv_transport.c
> @@ -529,9 +529,7 @@ static void hvs_release(struct vsock_sock *vsk)
>  	struct sock *sk = sk_vsock(vsk);
>  	bool remove_sock;
> 
> -	lock_sock_nested(sk, SINGLE_DEPTH_NESTING);
>  	remove_sock = hvs_close_lock_held(vsk);
> -	release_sock(sk);
>  	if (remove_sock)
>  		vsock_remove_sock(vsk);
>  }

This looks good to me, but do we know why vsk->transport->release(vsk)
is called without holding the lock for 'sk' in the first place?

Thanks,
Dexuan

Powered by blists - more mailing lists