lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2a47d336-c19b-6bf4-c247-d7382871eeea@grimberg.me>
Date:   Tue, 20 Nov 2018 20:25:46 -0800
From:   Sagi Grimberg <sagi@...mberg.me>
To:     Ming Lei <ming.lei@...hat.com>
Cc:     Christoph Hellwig <hch@....de>, Jens Axboe <axboe@...nel.dk>,
        linux-block@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org, Dave Chinner <dchinner@...hat.com>,
        Kent Overstreet <kent.overstreet@...il.com>,
        Mike Snitzer <snitzer@...hat.com>, dm-devel@...hat.com,
        Alexander Viro <viro@...iv.linux.org.uk>,
        linux-fsdevel@...r.kernel.org, Shaohua Li <shli@...nel.org>,
        linux-raid@...r.kernel.org, linux-erofs@...ts.ozlabs.org,
        David Sterba <dsterba@...e.com>, linux-btrfs@...r.kernel.org,
        "Darrick J . Wong" <darrick.wong@...cle.com>,
        linux-xfs@...r.kernel.org, Gao Xiang <gaoxiang25@...wei.com>,
        Theodore Ts'o <tytso@....edu>, linux-ext4@...r.kernel.org,
        Coly Li <colyli@...e.de>, linux-bcache@...r.kernel.org,
        Boaz Harrosh <ooo@...ctrozaur.com>,
        Bob Peterson <rpeterso@...hat.com>, cluster-devel@...hat.com
Subject: Re: [PATCH V10 09/19] block: introduce bio_bvecs()


>> I would like to avoid growing bvec tables and keep everything
>> preallocated. Plus, a bvec_iter operates on a bvec which means
>> we'll need a table there as well... Not liking it so far...
> 
> In case of bios in one request, we can't know how many bvecs there
> are except for calling rq_bvecs(), so it may not be suitable to
> preallocate the table. If you have to send the IO request in one send(),
> runtime allocation may be inevitable.

I don't want to do that, I want to work on a single bvec at a time like
the current implementation does.

> If you don't require to send the IO request in one send(), you may send
> one bio in one time, and just uses the bio's bvec table directly,
> such as the single bio case in lo_rw_aio().

we'd need some indication that we need to reinit my iter with the
new bvec, today we do:

static inline void nvme_tcp_advance_req(struct nvme_tcp_request *req,
                 int len)
{
         req->snd.data_sent += len;
         req->pdu_sent += len;
         iov_iter_advance(&req->snd.iter, len);
         if (!iov_iter_count(&req->snd.iter) &&
             req->snd.data_sent < req->data_len) {
                 req->snd.curr_bio = req->snd.curr_bio->bi_next;
                 nvme_tcp_init_send_iter(req);
         }
}

and initialize the send iter. I imagine that now I will need to
switch to the next bvec and only if I'm on the last I need to
use the next bio...

Do you offer an API for that?


>>> can this way avoid your blocking issue? You may see this
>>> example in branch 'rq->bio != rq->biotail' of lo_rw_aio().
>>
>> This is exactly an example of not ignoring the bios...
> 
> Yeah, that is the most common example, given merge is enabled
> in most of cases. If the driver or device doesn't care merge,
> you can disable it and always get single bio request, then the
> bio's bvec table can be reused for send().

Does bvec_iter span bvecs with your patches? I didn't see that change?

>> I'm not sure how this helps me either. Unless we can set a bvec_iter to
>> span bvecs or have an abstract bio crossing when we re-initialize the
>> bvec_iter I don't see how I can ignore bios completely...
> 
> rq_for_each_bvec() will iterate over all bvecs from all bios, so you
> needn't to see any bio in this req.

But I don't need this iteration, I need a transparent API like;
bvec2 = rq_bvec_next(rq, bvec)

This way I can simply always reinit my iter without thinking about how
the request/bios/bvecs are constructed...

> rq_bvecs() will return how many bvecs there are in this request(cover
> all bios in this req)

Still not very useful given that I don't want to use a table...

>>> So looks nvme-tcp host driver might be the 2nd driver which benefits
>>> from multi-page bvec directly.
>>>
>>> The multi-page bvec V11 has passed my tests and addressed almost
>>> all the comments during review on V10. I removed bio_vecs() in V11,
>>> but it won't be big deal, we can introduce them anytime when there
>>> is the requirement.
>>
>> multipage-bvecs and nvme-tcp are going to conflict, so it would be good
>> to coordinate on this. I think that nvme-tcp host needs some adjustments
>> as setting a bvec_iter. I'm under the impression that the change is rather
>> small and self-contained, but I'm not sure I have the full
>> picture here.
> 
> I guess I may not get your exact requirement on block io iterator from nvme-tcp
> too, :-(

They are pretty much listed above. Today nvme-tcp sets an iterator with:

vec = __bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter);
nsegs = bio_segments(bio);
size = bio->bi_iter.bi_size;
offset = bio->bi_iter.bi_bvec_done;
iov_iter_bvec(&req->snd.iter, WRITE, vec, nsegs, size);

and when done, iterate to the next bio and do the same.

With multipage bvec it would be great if we can simply have
something like rq_bvec_next() that would pretty much satisfy
the requirements from the nvme-tcp side...

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ