lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <47EB94AB.6090608@steeleye.com>
Date:	Thu, 27 Mar 2008 08:35:55 -0400
From:	Paul Clements <paul.clements@...eleye.com>
To:	Mike Snitzer <snitzer@...il.com>
CC:	nbd-general@...ts.sourceforge.net, linux-kernel@...r.kernel.org
Subject: Re: nbd: Oops because nbd doesn't prevent NBD_CLEAR_SOCK while sock_xmit()
 is working on a receive

Mike Snitzer wrote:

> In practice this looks like:
> 
> nbd1: NBD_DISCONNECT
> nbd1: Send control failed (result -32)
> end_request: I/O error, dev nbd1, sector 0
> end_request: I/O error, dev nbd1, sector 8032264
> md: super_written gets error=-5, uptodate=0
> raid1: Disk failure on nbd1, disabling device.
>         Operation continuing on 1 devices
> Unable to handle kernel NULL pointer dereference at 0000000000000028 RIP:
>  [<ffffffff88b1e125>] :nbd:sock_xmit+0x9d/0x301

> The fact that sock_xmit() in receive mode is unprotected seems to be
> the WHY a NULL pointer is possible; but I'm still trying to identify
> the HOW.

Do you know who is setting the socket NULL? Is it already NULL when you 
get to this point? Is it the nbd-client -d? Is it the original 
nbd-client/kernel that does it? Figuring that out would help narrow down 
the cause.

> But for me this begs the question:  why isn't the nbd_device's socket
> always protected during sock_xmit() for both
> transmits and receives; rather than just transmits (via tx_lock)!?

It would deadlock if we held the lock over both. Generally we don't have 
to worry about receives, since they're always done in the nbd-client 
process, so we have control over when and how it exits and cleans up. 
The odd case, as you've discovered, is when another process (nbd-client 
-d) comes along and starts mucking with the queue and socket. Would 
"kill -9 <nbd-client-pid>" work for you instead? That is what I use to 
break the connection, and it's safe, as it tells the original nbd-client 
to exit (which it does cleanly and safely).

--
Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ