lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1503609794-13233-9-git-send-email-philipp.reisner@linbit.com>
Date:   Thu, 24 Aug 2017 23:23:05 +0200
From:   Philipp Reisner <philipp.reisner@...bit.com>
To:     Jens Axboe <axboe@...com>, linux-kernel@...r.kernel.org
Cc:     drbd-dev@...ts.linbit.com
Subject: [PATCH 08/17] drbd: fix potential get_ldev/put_ldev refcount imbalance during attach

From: Lars Ellenberg <lars.ellenberg@...bit.com>

Race:

drbd_adm_attach()               | async drbd_md_endio()
                                |
device->ldev is still NULL.     |
                                |
drbd_md_read(                   |
 .endio = drbd_md_endio;        |
 submit;                        |
 ....                           |
 wait for done == 1;            |       done = 1;
);                              |       wake_up();
.. lot of other stuff,          |
.. includeing taking and        |
...giving up locks,             |
.. doing further IO,            |
.. stuff that takes "some time" |
                                | while in this context,
                                | this is the next statement.
                                | which means this context was scheduled
.. only then, finally,          | away for "some time".
device->ldev = nbc;             |
                                |       if (device->ldev)
                                |               put_ldev()

Unlikely, but possible. I was able to provoke it "reliably"
by adding an mdelay(500); after the wake_up().
Fixed by moving the if (!NULL) put_ldev() before done = 1;

Impact of the bug was that the resulting refcount imbalance
could lead to premature destruction of the object, potentially
causing a NULL pointer dereference during a subsequent detach.

Signed-off-by: Philipp Reisner <philipp.reisner@...bit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@...bit.com>

diff --git a/drivers/block/drbd/drbd_worker.c b/drivers/block/drbd/drbd_worker.c
index e48012d..f0717a9 100644
--- a/drivers/block/drbd/drbd_worker.c
+++ b/drivers/block/drbd/drbd_worker.c
@@ -65,6 +65,11 @@ void drbd_md_endio(struct bio *bio)
 	device = bio->bi_private;
 	device->md_io.error = blk_status_to_errno(bio->bi_status);
 
+	/* special case: drbd_md_read() during drbd_adm_attach() */
+	if (device->ldev)
+		put_ldev(device);
+	bio_put(bio);
+
 	/* We grabbed an extra reference in _drbd_md_sync_page_io() to be able
 	 * to timeout on the lower level device, and eventually detach from it.
 	 * If this io completion runs after that timeout expired, this
@@ -79,9 +84,6 @@ void drbd_md_endio(struct bio *bio)
 	drbd_md_put_buffer(device);
 	device->md_io.done = 1;
 	wake_up(&device->misc_wait);
-	bio_put(bio);
-	if (device->ldev) /* special case: drbd_md_read() during drbd_adm_attach() */
-		put_ldev(device);
 }
 
 /* reads on behalf of the partner,
-- 
2.7.4

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ