lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 26 Mar 2008 14:43:23 -0400
From:	"Mike Snitzer" <snitzer@...il.com>
To:	"Paul Clements" <paul.clements@...eleye.com>
Cc:	nbd-general-request@...ts.sourceforge.net,
	linux-kernel@...r.kernel.org
Subject: nbd: Oops because nbd doesn't prevent NBD_CLEAR_SOCK while sock_xmit() is working on a receive

I'm seeing that nbd_device's socket is getting set to NULL in the
middle of nbd_read_stat()'s sock_xmit().

There appears to be a race where 'nbd-client -d' requests that an NBD
device first disconnect from the nbd-server (via NBD_DISCONNECT ioctl)
and then set the NBD device's socket to NULL, etc (via
NBD_CLEAR_SOCK).

Both NBD_DISCONNECT and NBD_CLEAR_SOCK take the nbd_device's tx_lock
(which protects the socket during transmits) _but_ for receives the
socket can be set to NULL (via NBD_CLEAR_SOCK) at any time while
inside sock_xmit(); as such NBD_CLEAR_SOCK can cause a NULL pointer in
sock_xmit().

Analyzing the crash it is clear that the NULL pointer comes when
sock_xmit()'s do {} while() dereferences the nbd_device's socket with:
sock->sk->sk_allocation = GFP_NOIO;
I also saw that the sock_xmit() caller is nbd_read_stat().

The sequence looks like this:

nbd1: NBD_DISCONNECT
[NOTE: a sock_xmit() send attempt is made on behalf of NBD_DISCONNECT]
nbd1: Send control failed (result -32)
...
[NBD is still dequeueing requests]
...
Race: [NBD_CLEAR_SOCK ioctl][FATAL: nbd_read_stat()'s sock_xmit()
receive attempt causes NULL pointer]

In practice this looks like:

nbd1: NBD_DISCONNECT
nbd1: Send control failed (result -32)
end_request: I/O error, dev nbd1, sector 0
end_request: I/O error, dev nbd1, sector 8032264
md: super_written gets error=-5, uptodate=0
raid1: Disk failure on nbd1, disabling device.
        Operation continuing on 1 devices
Unable to handle kernel NULL pointer dereference at 0000000000000028 RIP:
 [<ffffffff88b1e125>] :nbd:sock_xmit+0x9d/0x301

The fact that sock_xmit() in receive mode is unprotected seems to be
the WHY a NULL pointer is possible; but I'm still trying to identify
the HOW.

But for me this begs the question:  why isn't the nbd_device's socket
always protected during sock_xmit() for both
transmits and receives; rather than just transmits (via tx_lock)!?

Any help on the "right" fix would be appreciated, thanks.
Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ