lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <170fa0d20803271412i6d7dd81ax41e0f77339b5def7@mail.gmail.com>
Date:	Thu, 27 Mar 2008 17:12:37 -0400
From:	"Mike Snitzer" <snitzer@...il.com>
To:	"Paul Clements" <paul.clements@...eleye.com>
Cc:	nbd-general@...ts.sourceforge.net, linux-kernel@...r.kernel.org
Subject: Re: nbd: Oops because nbd doesn't prevent NBD_CLEAR_SOCK while sock_xmit() is working on a receive

On Thu, Mar 27, 2008 at 9:21 AM, Mike Snitzer <snitzer@...il.com> wrote:
> On Thu, Mar 27, 2008 at 8:35 AM, Paul Clements
>  <paul.clements@...eleye.com> wrote:
>  > Mike Snitzer wrote:
>  >
>  >  > In practice this looks like:
>  >  >
>  >  > nbd1: NBD_DISCONNECT
>  >  > nbd1: Send control failed (result -32)
>  >  > end_request: I/O error, dev nbd1, sector 0
>  >  > end_request: I/O error, dev nbd1, sector 8032264
>  >  > md: super_written gets error=-5, uptodate=0
>  >  > raid1: Disk failure on nbd1, disabling device.
>  >  >         Operation continuing on 1 devices
>  >  > Unable to handle kernel NULL pointer dereference at 0000000000000028 RIP:
>  >  >  [<ffffffff88b1e125>] :nbd:sock_xmit+0x9d/0x301
>  >
>  >  > The fact that sock_xmit() in receive mode is unprotected seems to be
>  >  > the WHY a NULL pointer is possible; but I'm still trying to identify
>  >  > the HOW.
>  >
>  >  Do you know who is setting the socket NULL? Is it already NULL when you
>  >  get to this point? Is it the nbd-client -d? Is it the original
>  >  nbd-client/kernel that does it? Figuring that out would help narrow down
>  >  the cause.
>
>  I believe that NBD_CLEAR_SOCK from 'nbd-client -d' sets it to NULL.
>  lo->sock is already NULL on entry to sock_xmit().
>
>  So simply checking if the sock_xmit's 'sock' is NULL _should_ avoid
>  any possibility of a NULL pointer Oops because sock can't be !NULL
>  after the negative check (because of the sock = lo->sock assignment).
>  That is, unless I'm missing somewhere in the rest of the kernel (not
>  nbd) that would take action to set a socket to NULL?
>
>  The attached patch seems reasonable.  I'll be testing today to verify
>  it fixes the problem.

The nbd_sock_xmit_oops.patch from my previous email does fix this issue.

I verified this by making the race window wider with the attached patch.

The sequence to hit this race (with a patched nbd) is:

# insmod ./nbd_exposed.ko
# nbd-client 192.168.114.152 9011 /dev/nbd1
Negotiation: ..size = 3148739KB
bs=1024, sz=3148739
# dd if=/dev/nbd1 of=/dev/null &
[ NOTE: wait to see "sleeping 30s before recv in sock_xmit" in the syslog ]
# nbd-client -d /dev/nbd1

In the system log you'll see something like:

nbd: registered device at major 43
nbd1:
nbd1: sleeping 30s before recv in sock_xmit
nbd1: NBD_DISCONNECT
nbd1: sleeping 30s after setting sock to NULL in NBD_CLEAR_SOCK
nbd1: sleeping 30s before recv in sock_xmit
Unable to handle kernel NULL pointer dereference at 0000000000000028 RIP:
 [<ffffffff883a917c>] :nbd:sock_xmit+0xf0/0x33e
PGD 141b52067 PUD 144eaf067 PMD 0
Oops: 0000 [1] SMP
...

With the nbd_sock_xmit_oops.patch applied this Oops no longer happens.
 I think my original patch's use of EINVAL should be changed to EIO.
Any other comments on the fix are welcome.

Paul, would you like an updated patch that you push upstream?  Or
should I just submit the patch and Cc: you?

Mike

View attachment "nbd_sock_xmit_expose_race.patch" of type "text/x-patch" (1211 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ