lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5272D1BB.3030205@infradead.org>
Date:	Thu, 31 Oct 2013 14:55:07 -0700
From:	Randy Dunlap <rdunlap@...radead.org>
To:	Kent Overstreet <kmo@...erainc.com>, axboe@...nel.dk
CC:	tj@...nel.org, linux-kernel@...r.kernel.org,
	NeilBrown <neilb@...e.de>, Christoph Hellwig <hch@....de>
Subject: Re: [PATCH] block: Document immutable biovecs

On 10/31/13 14:27, Kent Overstreet wrote:
> Signed-off-by: Kent Overstreet <kmo@...erainc.com>
> Cc: Jens Axboe <axboe@...nel.dk>
> Cc: NeilBrown <neilb@...e.de>
> Cc: Christoph Hellwig <hch@....de>
> ---
>  Documentation/block/biovecs.txt | 111 ++++++++++++++++++++++++++++++++++++++++
>  include/linux/blk_types.h       |   3 +-
>  2 files changed, 113 insertions(+), 1 deletion(-)
>  create mode 100644 Documentation/block/biovecs.txt
> 
> diff --git a/Documentation/block/biovecs.txt b/Documentation/block/biovecs.txt
> new file mode 100644
> index 0000000..34711fa
> --- /dev/null
> +++ b/Documentation/block/biovecs.txt
> @@ -0,0 +1,111 @@
> +
> +Immutable biovecs and biovec iterators:
> +=======================================
> +
> +Kent Overstreet <kmo@...erainc.com>
> +
> +As of 3.13, biovecs should never be modified after a bio has been submitted.
> +Instead, we have a new struct bvec_iter which represents a range of a biovec -
> +the iterator will be modified as the bio is completed, not the biovec.
> +
> +More specifically, old code that needed to partially complete a bio would
> +update bi_sector and bi_size, and advance bi_idx to the next biovec. If it
> +ended up partway through a biovec, it would increment bv_offset and decrement
> +bv_len by the number of bytes completed in that biovec.
> +
> +In the new scheme of things, everything that must be mutated in order to
> +partially complete a bio is segregated into struct bvec_iter: bi_sector,
> +bi_size and bi_idx have been moved there; and instead of modifying bv_offset
> +and bv_len, struct bvec_iter has bi_bvec_done, which represents the number of
> +bytes completed in the current bvec.
> +
> +There are a bunch of new helper macros for hiding the gory details - in
> +particular, presenting the illusion of partially completed biovecs so that
> +normal code doesn't have to deal with bi_bvec_done.
> +
> + * Driver code should no longer refer to biovecs directly; we now have
> +   bio_iovec() and bio_iovec_iter() macros that return literal struct biovecs,
> +   constructed from the raw biovecs but taking into account bi_bvec_done and
> +   bi_size.
> +
> +   bio_for_each_segment() has been updated to take a bvec_iter argument
> +   instead of an integer (that corresponded to bi_idx); for a lot of code the
> +   conversion just required changing the types of the arguments to
> +   bio_for_each_segment().
> +
> + * Advancing a bvec_iter is done with bio_advance_iter(); bio_advance() is a
> +   wrapper around bio_advance_iter() that operates on bio->bi_iter, and also
> +   advances the bio integrity's iter if present.
> +
> +   There is a lower level advance function - bvec_iter_advance() - which takes
> +   a pointer to a biovec, not a bio; this is used by the bio integrity code.
> +
> +What's all this get us?
> +=======================
> +
> +Having a real iterator, and making biovecs immutable, has a number of
> +advantages:
> +
> + * Before, iterating over bios was very awkward when you weren't processing
> +   exactly one bvec at a time - for example, bio_copy_data() in fs/bio.c,
> +   which copies the contents of one bio into another. Because the biovecs
> +   wouldn't necessarily be the same size, the old code was tricky convoluted -
> +   it had to walk two different bios at the same time, keeping both bi_idx and
> +   and offset into the current biovec for each.
> +
> +   The new code is much more straightforward - have a look. This sort of
> +   pattern comes up in a lot of places; a lot of drivers were essentially open
> +   coding bvec iterators before, and having common implementation considerably
> +   simplifies a lot of code.
> +
> + * Before, any code that might need to use the biovec after the bio had been
> +   completed (perhaps to copy the data somewhere else, or perhaps to resubmit
> +   it somewhere else if there was an error) had to save the entire bvec array
> +   - again, this was being done in a fair number of places.
> +
> + * Biovecs can be shared between multiple bios - a bvec iter can represent an
> +   arbitrary range of an existing biovec, both starting an ending midway

                                                           and

> +   through biovecs. This is what enables efficient splitting of arbitrary
> +   bios. Note that this means we _only_ use bi_size to determine when we're

                                                                         we've

> +   reached the end of a bio, not bi_vcnt - and the bio_iovec() macro takes
> +   bi_size into account when constructing biovecs.
> +
> + * Splitting bios is now much simpler. The old bio_split() didn't even work on
> +   bios with more than a single bvec! Now, we can efficiently split arbitrary
> +   size bios - because the new bio can share the old bio's biovec.
> +
> +   Care must be taken to ensure the biovec isn't freed while the split bio is
> +   still using it, in case the original bio completes first, though. Using
> +   bio_chain() when splitting bios helps with this.
> +
> + * Submitting partially completed bios is now perfectly fine - this comes up
> +   occasionally in stacking block drivers and various code (e.g. md and
> +   bcache) had some ugly workarounds for this.
> +
> +   It used to be the case that submitting a partially completed bio would work
> +   fine to _most_ devices, but since accessing the raw bvec array was the
> +   norm, not all drivers would respect bi_idx and those would break. Now,
> +   since all drivers _must_ go through the bvec iterator - and have been
> +   audited to make sure they are - submitting partially completed bios is
> +   perfectly fine.
> +
> +Other implications:
> +===================
> +
> + * Almost all usage of bi_idx is now incorrect and has been removed; instead,
> +   where previously you would have used bi_idx you'd now use a bvec_iter,
> +   probably passing it to one of the helper macros.
> +
> +   i.e. instead of using bio_iovec_idx() (or bio->bi_iovec[bio->bi_idx]), you

      I.e.,

> +   would now use bio_iter_iovec(), which takes a bvec_iter and returns a

drop "would".

> +   literal struct bio_vec - constructed on the fly from the raw biovec but
> +   taking into account bi_bvec_done (and bi_size).
> +
> + * bi_vcnt can't be trusted or relied upon by driver code - i.e. anything that
> +   doesn't actually own the bio. The reason is twofold: firstly, it's not
> +   actually needed for iterating over the bio anymore - we only go bi_size.

                                                                   use

> +   Secondly, when cloning a bio and reusing (a portion of) the original bio's
> +   biovec, in order to calculate bi_vcnt for the new bio we'd have to iterate
> +   over all the biovecs in the new bio - which is silly as it's not needed.
> +
> +   So, don't use bi_vcnt anymore.

Nice job.
Thanks.

-- 
~Randy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ