lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <x494mxroqzp.fsf@segfault.boston.devel.redhat.com>
Date:	Mon, 04 Aug 2014 17:59:54 -0400
From:	Jeff Moyer <jmoyer@...hat.com>
To:	axboe@...nel.dk
Cc:	linux-kernel@...r.kernel.org
Subject: [patch|rfc] block: fix BLK_MQ_F_SG_MERGE support

Hi Jens,

With device mapper targets on top of a device that does not have
BLK_MQ_F_SG_MERGE (the default), we can end up sending down more segments
than a driver is prepared to handle.  I ran into this with virtio_blk,
triggerring this BUG_ON:

        BUG_ON(req->nr_phys_segments + 2 > vblk->sg_elems);

The queue's max is set here:
        blk_queue_max_segments(q, vblk->sg_elems-2);

Basically, what happens is that a bio is built up for the dm device
(which does not have the BLK_MQ_F_SG_MERGE flag set) using
bio_add_page.  That path will call into __blk_recalc_rq_segments, so
what you end up with is bi_phys_segments being much smaller than
bi_vcnt.

And then you clone the bio.  When the cloned bio is submitted, it will
end up in blk_recount_segments, here:

        if (test_bit(QUEUE_FLAG_NO_SG_MERGE, &q->queue_flags))
                bio->bi_phys_segments = bio->bi_vcnt;

and now we've set bio->bi_phys_segments to a number that is beyond what
was registered as queue_max_segments by the driver.

I'm not sure how you want to address this.  The attached patch works,
but it doesn't feel right to me.  Probably the right thing to do is not
do the merging for the upper level device, but we don't have a mechanism
for propagating queue flags up the stack as yet.

Note that I ran into this when backporting the virtio_blk driver.  I
could not reproduce this problem with an upstream kernel, but from the
looks of the code, it's possible to hit it.

Also note that setting BLK_MQ_F_SG_MERGE for the virtio blk device works
around the problem, as you'd expect.

Cheers,
Jeff

p.s. no S-O-B as I don't think this is the right fix

diff --git a/block/blk-merge.c b/block/blk-merge.c
index b3bf0df..af2f472 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -93,7 +93,8 @@ void blk_recalc_rq_segments(struct request *rq)
 
 void blk_recount_segments(struct request_queue *q, struct bio *bio)
 {
-	if (test_bit(QUEUE_FLAG_NO_SG_MERGE, &q->queue_flags))
+	if (test_bit(QUEUE_FLAG_NO_SG_MERGE, &q->queue_flags) &&
+	    !bio_flagged(bio, BIO_CLONED))
 		bio->bi_phys_segments = bio->bi_vcnt;
 	else {
 		struct bio *nxt = bio->bi_next;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ