[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2a47d336-c19b-6bf4-c247-d7382871eeea@grimberg.me>
Date: Tue, 20 Nov 2018 20:25:46 -0800
From: Sagi Grimberg <sagi@...mberg.me>
To: Ming Lei <ming.lei@...hat.com>
Cc: Christoph Hellwig <hch@....de>, Jens Axboe <axboe@...nel.dk>,
linux-block@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-mm@...ck.org, Dave Chinner <dchinner@...hat.com>,
Kent Overstreet <kent.overstreet@...il.com>,
Mike Snitzer <snitzer@...hat.com>, dm-devel@...hat.com,
Alexander Viro <viro@...iv.linux.org.uk>,
linux-fsdevel@...r.kernel.org, Shaohua Li <shli@...nel.org>,
linux-raid@...r.kernel.org, linux-erofs@...ts.ozlabs.org,
David Sterba <dsterba@...e.com>, linux-btrfs@...r.kernel.org,
"Darrick J . Wong" <darrick.wong@...cle.com>,
linux-xfs@...r.kernel.org, Gao Xiang <gaoxiang25@...wei.com>,
Theodore Ts'o <tytso@....edu>, linux-ext4@...r.kernel.org,
Coly Li <colyli@...e.de>, linux-bcache@...r.kernel.org,
Boaz Harrosh <ooo@...ctrozaur.com>,
Bob Peterson <rpeterso@...hat.com>, cluster-devel@...hat.com
Subject: Re: [PATCH V10 09/19] block: introduce bio_bvecs()
>> I would like to avoid growing bvec tables and keep everything
>> preallocated. Plus, a bvec_iter operates on a bvec which means
>> we'll need a table there as well... Not liking it so far...
>
> In case of bios in one request, we can't know how many bvecs there
> are except for calling rq_bvecs(), so it may not be suitable to
> preallocate the table. If you have to send the IO request in one send(),
> runtime allocation may be inevitable.
I don't want to do that, I want to work on a single bvec at a time like
the current implementation does.
> If you don't require to send the IO request in one send(), you may send
> one bio in one time, and just uses the bio's bvec table directly,
> such as the single bio case in lo_rw_aio().
we'd need some indication that we need to reinit my iter with the
new bvec, today we do:
static inline void nvme_tcp_advance_req(struct nvme_tcp_request *req,
int len)
{
req->snd.data_sent += len;
req->pdu_sent += len;
iov_iter_advance(&req->snd.iter, len);
if (!iov_iter_count(&req->snd.iter) &&
req->snd.data_sent < req->data_len) {
req->snd.curr_bio = req->snd.curr_bio->bi_next;
nvme_tcp_init_send_iter(req);
}
}
and initialize the send iter. I imagine that now I will need to
switch to the next bvec and only if I'm on the last I need to
use the next bio...
Do you offer an API for that?
>>> can this way avoid your blocking issue? You may see this
>>> example in branch 'rq->bio != rq->biotail' of lo_rw_aio().
>>
>> This is exactly an example of not ignoring the bios...
>
> Yeah, that is the most common example, given merge is enabled
> in most of cases. If the driver or device doesn't care merge,
> you can disable it and always get single bio request, then the
> bio's bvec table can be reused for send().
Does bvec_iter span bvecs with your patches? I didn't see that change?
>> I'm not sure how this helps me either. Unless we can set a bvec_iter to
>> span bvecs or have an abstract bio crossing when we re-initialize the
>> bvec_iter I don't see how I can ignore bios completely...
>
> rq_for_each_bvec() will iterate over all bvecs from all bios, so you
> needn't to see any bio in this req.
But I don't need this iteration, I need a transparent API like;
bvec2 = rq_bvec_next(rq, bvec)
This way I can simply always reinit my iter without thinking about how
the request/bios/bvecs are constructed...
> rq_bvecs() will return how many bvecs there are in this request(cover
> all bios in this req)
Still not very useful given that I don't want to use a table...
>>> So looks nvme-tcp host driver might be the 2nd driver which benefits
>>> from multi-page bvec directly.
>>>
>>> The multi-page bvec V11 has passed my tests and addressed almost
>>> all the comments during review on V10. I removed bio_vecs() in V11,
>>> but it won't be big deal, we can introduce them anytime when there
>>> is the requirement.
>>
>> multipage-bvecs and nvme-tcp are going to conflict, so it would be good
>> to coordinate on this. I think that nvme-tcp host needs some adjustments
>> as setting a bvec_iter. I'm under the impression that the change is rather
>> small and self-contained, but I'm not sure I have the full
>> picture here.
>
> I guess I may not get your exact requirement on block io iterator from nvme-tcp
> too, :-(
They are pretty much listed above. Today nvme-tcp sets an iterator with:
vec = __bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter);
nsegs = bio_segments(bio);
size = bio->bi_iter.bi_size;
offset = bio->bi_iter.bi_bvec_done;
iov_iter_bvec(&req->snd.iter, WRITE, vec, nsegs, size);
and when done, iterate to the next bio and do the same.
With multipage bvec it would be great if we can simply have
something like rq_bvec_next() that would pretty much satisfy
the requirements from the nvme-tcp side...
Powered by blists - more mailing lists