[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1438581502.26596.24.camel@hasee>
Date: Sun, 02 Aug 2015 22:58:22 -0700
From: Ming Lin <mlin@...nel.org>
To: Mike Snitzer <snitzer@...hat.com>
Cc: lkml <linux-kernel@...r.kernel.org>,
Christoph Hellwig <hch@....de>, Jens Axboe <axboe@...nel.dk>,
Kent Overstreet <kent.overstreet@...il.com>,
Dongsu Park <dpark@...teo.net>,
Christoph Hellwig <hch@...radead.org>,
Al Viro <viro@...iv.linux.org.uk>,
Ming Lei <ming.lei@...onical.com>, Neil Brown <neilb@...e.de>,
Alasdair Kergon <agk@...hat.com>, dm-devel@...hat.com,
Lars Ellenberg <drbd-dev@...ts.linbit.com>,
drbd-user@...ts.linbit.com, Jiri Kosina <jkosina@...e.cz>,
Geoff Levand <geoff@...radead.org>, Jim Paris <jim@...n.com>,
Joshua Morris <josh.h.morris@...ibm.com>,
Philip Kelleher <pjk1939@...ux.vnet.ibm.com>,
Minchan Kim <minchan@...nel.org>,
Nitin Gupta <ngupta@...are.org>,
Oleg Drokin <oleg.drokin@...el.com>,
Andreas Dilger <andreas.dilger@...el.com>,
Ming Lin <ming.l@....samsung.com>
Subject: Re: [PATCH v5 01/11] block: make generic_make_request handle
arbitrarily sized bios
On Sat, 2015-08-01 at 12:33 -0400, Mike Snitzer wrote:
> On Sat, Aug 01 2015 at 2:58am -0400,
> Ming Lin <mlin@...nel.org> wrote:
>
> > On Fri, 2015-07-31 at 17:38 -0400, Mike Snitzer wrote:
> > >
> > > OK, once setup, to run the 2 tests in question directly you'd do
> > > something like:
> > >
> > > dmtest run --suite thin-provisioning -n discard_a_fragmented_device
> > >
> > > dmtest run --suite thin-provisioning -n discard_fully_provisioned_device_benchmark
> > >
> > > Again, these tests pass without this patchset.
> >
> > It's caused by patch 4.
Typo. I mean patch 5.
> > When discard size >=4G, the bio->bi_iter.bi_size overflows.
>
> Thanks for tracking this down!
blkdev_issue_write_same() has same problem.
>
> > Below is the new patch.
> >
> > Christoph,
> > Could you also help to review it?
> >
> > Now we still do "misaligned" check in blkdev_issue_discard().
> > So the same code in blk_bio_discard_split() was removed.
>
> But I don't agree with this approach. One of the most meaningful
> benefits of late bio splitting is the upper layers shouldn't _need_ to
> depend on the intermediate devices' queue_limits being stacked properly.
> Your solution to mix discard granularity/alignment checks at the upper
> layer(s) but then split based on max_discard_sectors at the lower layer
> defeats that benefit for discards.
>
> This will translate to all intermediate layers that might split
> discards needing to worry about granularity/alignment
> too (e.g. how dm-thinp will have to care because it must generate
> discard mappings with associated bios based on how blocks were mapped to
> thinp).
I think the important thing is the late splitting for regular bio.
For discard/write_same bio, how about just don't do late splitting?
That is:
1. remove "PATCH 5: block: remove split code in blkdev_issue_discard"
2. Add below changes to PATCH 1
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 1f5dfa0..90b085e 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -9,59 +9,6 @@
#include "blk.h"
-static struct bio *blk_bio_discard_split(struct request_queue *q,
- struct bio *bio,
- struct bio_set *bs)
-{
- unsigned int max_discard_sectors, granularity;
- int alignment;
- sector_t tmp;
- unsigned split_sectors;
-
- /* Zero-sector (unknown) and one-sector granularities are the same. */
- granularity = max(q->limits.discard_granularity >> 9, 1U);
-
- max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX >> 9);
- max_discard_sectors -= max_discard_sectors % granularity;
-
- if (unlikely(!max_discard_sectors)) {
- /* XXX: warn */
- return NULL;
- }
-
- if (bio_sectors(bio) <= max_discard_sectors)
- return NULL;
-
- split_sectors = max_discard_sectors;
-
- /*
- * If the next starting sector would be misaligned, stop the discard at
- * the previous aligned sector.
- */
- alignment = (q->limits.discard_alignment >> 9) % granularity;
-
- tmp = bio->bi_iter.bi_sector + split_sectors - alignment;
- tmp = sector_div(tmp, granularity);
-
- if (split_sectors > tmp)
- split_sectors -= tmp;
-
- return bio_split(bio, split_sectors, GFP_NOIO, bs);
-}
-
-static struct bio *blk_bio_write_same_split(struct request_queue *q,
- struct bio *bio,
- struct bio_set *bs)
-{
- if (!q->limits.max_write_same_sectors)
- return NULL;
-
- if (bio_sectors(bio) <= q->limits.max_write_same_sectors)
- return NULL;
-
- return bio_split(bio, q->limits.max_write_same_sectors, GFP_NOIO, bs);
-}
-
static struct bio *blk_bio_segment_split(struct request_queue *q,
struct bio *bio,
struct bio_set *bs)
@@ -129,10 +76,8 @@ void blk_queue_split(struct request_queue *q, struct bio **bio,
{
struct bio *split;
- if ((*bio)->bi_rw & REQ_DISCARD)
- split = blk_bio_discard_split(q, *bio, bs);
- else if ((*bio)->bi_rw & REQ_WRITE_SAME)
- split = blk_bio_write_same_split(q, *bio, bs);
+ if ((*bio)->bi_rw & REQ_DISCARD || (*bio)->bi_rw & REQ_WRITE_SAME)
+ split = NULL;
else
split = blk_bio_segment_split(q, *bio, q->bio_split);
>
> Also, it is unfortunate that IO that doesn't have a payload is being
> artificially split simply because bio->bi_iter.bi_size is 32bits.
Indeed.
Will it be possible to make it 64bits? I guess no.
>
> Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists