linux-kernel - Re: [PATCH] block: optimise bvec_iter

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20191130185634.GA1848835@rani.riverdale.lan>
Date:   Sat, 30 Nov 2019 13:56:35 -0500
From:   Arvind Sankar <nivedita@...m.mit.edu>
To:     Pavel Begunkov <asml.silence@...il.com>
Cc:     Arvind Sankar <nivedita@...m.mit.edu>,
        Jens Axboe <axboe@...nel.dk>, Ming Lei <ming.lei@...hat.com>,
        linux-block@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] block: optimise bvec_iter_advance()

On Sat, Nov 30, 2019 at 12:22:27PM +0300, Pavel Begunkov wrote:
> On 30/11/2019 02:24, Arvind Sankar wrote:
> > On Sat, Nov 30, 2019 at 01:47:16AM +0300, Pavel Begunkov wrote:
> >> On 30/11/2019 01:17, Arvind Sankar wrote:
> >>>
> >>> The loop can be simplified a bit further, as done has to be 0 once we go
> >>> beyond the current bio_vec. See below for the simplified version.
> >>>
> >>
> >> Thanks for the suggestion! I thought about it, and decided to not
> >> for several reasons. I prefer to not fine-tune and give compilers
> >> more opportunity to do their job. And it's already fast enough with
> >> modern architectures (MOVcc, complex addressing, etc).
> >>
> >> Also need to consider code clarity and the fact, that this is inline,
> >> so should be brief and register-friendly.
> >>
> > 
> > It should be more register-friendly, as it uses fewer variables, and I
> > think it's easier to see what the loop is doing, i.e. that we advance
> > one bio_vec per iteration: in the existing code, it takes a bit of
> > thinking to see that we won't spend more than one iteration within the
> > same bio_vec.
> 
> Yeah, may be. It's more the matter of preference then. I don't think
> it's simpler, and performance is entirely depends on a compiler and 
> input. But, that's rather subjective and IMHO not worth of time.
> 
> Anyway, thanks for thinking this through!
> 

You don't find listing 1 simpler than listing 2? It does save one
register, as it doesn't have to keep track of done independently from
bytes. This is always going to be the case unless the compiler can
eliminate done by transforming Listing 2 into Listing 1. Unfortunately,
even if it gets much smarter, it's unlikely to be able to do that,
because they're equivalent only if there is no overflow, so it would
need to know that bytes + iter->bi_bvec_done cannot overflow, and that
iter->bi_bvec_done must be smaller than cur->bv_len initially.

Listing 1:

	bytes += iter->bi_bvec_done;
	while (bytes) {
		const struct bio_vec *cur = bv + idx;

		if (bytes < cur->bv_len)
			break;
		bytes -= cur->bv_len;
		idx++;
	}

	iter->bi_idx = idx;
	iter->bi_bvec_done = bytes;

Listing 2:

	while (bytes) {
		const struct bio_vec *cur = bv + idx;
		unsigned int len = min(bytes, cur->bv_len - done);

		bytes -= len;
		done += len;
		if (done == cur->bv_len) {
			idx++;
			done = 0;
		}
	}

	iter->bi_idx = idx;
	iter->bi_bvec_done = done;