[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c52b6a8b-d1d4-67ff-f81c-371d09cc6d5b@kernel.dk>
Date: Fri, 15 Feb 2019 08:49:31 -0700
From: Jens Axboe <axboe@...nel.dk>
To: Ming Lei <ming.lei@...hat.com>
Cc: linux-block@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-mm@...ck.org, Theodore Ts'o <tytso@....edu>,
Omar Sandoval <osandov@...com>,
Sagi Grimberg <sagi@...mberg.me>,
Dave Chinner <dchinner@...hat.com>,
Kent Overstreet <kent.overstreet@...il.com>,
Mike Snitzer <snitzer@...hat.com>, dm-devel@...hat.com,
Alexander Viro <viro@...iv.linux.org.uk>,
linux-fsdevel@...r.kernel.org, linux-raid@...r.kernel.org,
David Sterba <dsterba@...e.com>, linux-btrfs@...r.kernel.org,
"Darrick J . Wong" <darrick.wong@...cle.com>,
linux-xfs@...r.kernel.org, Gao Xiang <gaoxiang25@...wei.com>,
Christoph Hellwig <hch@....de>, linux-ext4@...r.kernel.org,
Coly Li <colyli@...e.de>, linux-bcache@...r.kernel.org,
Boaz Harrosh <ooo@...ctrozaur.com>,
Bob Peterson <rpeterso@...hat.com>, cluster-devel@...hat.com
Subject: Re: [PATCH V15 00/18] block: support multi-page bvec
On 2/15/19 4:13 AM, Ming Lei wrote:
> Hi,
>
> This patchset brings multi-page bvec into block layer:
>
> 1) what is multi-page bvec?
>
> Multipage bvecs means that one 'struct bio_bvec' can hold multiple pages
> which are physically contiguous instead of one single page used in linux
> kernel for long time.
>
> 2) why is multi-page bvec introduced?
>
> Kent proposed the idea[1] first.
>
> As system's RAM becomes much bigger than before, and huge page, transparent
> huge page and memory compaction are widely used, it is a bit easy now
> to see physically contiguous pages from fs in I/O. On the other hand, from
> block layer's view, it isn't necessary to store intermediate pages into bvec,
> and it is enough to just store the physicallly contiguous 'segment' in each
> io vector.
>
> Also huge pages are being brought to filesystem and swap [2][6], we can
> do IO on a hugepage each time[3], which requires that one bio can transfer
> at least one huge page one time. Turns out it isn't flexiable to change
> BIO_MAX_PAGES simply[3][5]. Multipage bvec can fit in this case very well.
> As we saw, if CONFIG_THP_SWAP is enabled, BIO_MAX_PAGES can be configured
> as much bigger, such as 512, which requires at least two 4K pages for holding
> the bvec table.
>
> With multi-page bvec:
>
> - Inside block layer, both bio splitting and sg map can become more
> efficient than before by just traversing the physically contiguous
> 'segment' instead of each page.
>
> - segment handling in block layer can be improved much in future since it
> should be quite easy to convert multipage bvec into segment easily. For
> example, we might just store segment in each bvec directly in future.
>
> - bio size can be increased and it should improve some high-bandwidth IO
> case in theory[4].
>
> - there is opportunity in future to improve memory footprint of bvecs.
>
> 3) how is multi-page bvec implemented in this patchset?
>
> Patch 1 ~ 3 parpares for supporting multi-page bvec.
>
> Patches 4 ~ 14 implement multipage bvec in block layer:
>
> - put all tricks into bvec/bio/rq iterators, and as far as
> drivers and fs use these standard iterators, they are happy
> with multipage bvec
>
> - introduce bio_for_each_bvec() to iterate over multipage bvec for splitting
> bio and mapping sg
>
> - keep current bio_for_each_segment*() to itereate over singlepage bvec and
> make sure current users won't be broken; especailly, convert to this
> new helper prototype in single patch 21 given it is bascially a mechanism
> conversion
>
> - deal with iomap & xfs's sub-pagesize io vec in patch 13
>
> - enalbe multipage bvec in patch 14
>
> Patch 15 redefines BIO_MAX_PAGES as 256.
>
> Patch 16 documents usages of bio iterator helpers.
>
> Patch 17~18 kills NO_SG_MERGE.
>
> These patches can be found in the following git tree:
>
> git: https://github.com/ming1/linux.git v5.0-blk_mp_bvec_v14
^^^
v15?
> Lots of test(blktest, xfstests, ltp io, ...) have been run with this patchset,
> and not see regression.
>
> Thanks Christoph for reviewing the early version and providing very good
> suggestions, such as: introduce bio_init_with_vec_table(), remove another
> unnecessary helpers for cleanup and so on.
>
> Thanks Chritoph and Omar for reviewing V10/V11/V12, and provides lots of
> helpful comments.
Applied, thanks Ming. Let's hope it sticks!
--
Jens Axboe
Powered by blists - more mailing lists