lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 30 Nov 2019 13:56:35 -0500
From:   Arvind Sankar <nivedita@...m.mit.edu>
To:     Pavel Begunkov <asml.silence@...il.com>
Cc:     Arvind Sankar <nivedita@...m.mit.edu>,
        Jens Axboe <axboe@...nel.dk>, Ming Lei <ming.lei@...hat.com>,
        linux-block@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] block: optimise bvec_iter_advance()

On Sat, Nov 30, 2019 at 12:22:27PM +0300, Pavel Begunkov wrote:
> On 30/11/2019 02:24, Arvind Sankar wrote:
> > On Sat, Nov 30, 2019 at 01:47:16AM +0300, Pavel Begunkov wrote:
> >> On 30/11/2019 01:17, Arvind Sankar wrote:
> >>>
> >>> The loop can be simplified a bit further, as done has to be 0 once we go
> >>> beyond the current bio_vec. See below for the simplified version.
> >>>
> >>
> >> Thanks for the suggestion! I thought about it, and decided to not
> >> for several reasons. I prefer to not fine-tune and give compilers
> >> more opportunity to do their job. And it's already fast enough with
> >> modern architectures (MOVcc, complex addressing, etc).
> >>
> >> Also need to consider code clarity and the fact, that this is inline,
> >> so should be brief and register-friendly.
> >>
> > 
> > It should be more register-friendly, as it uses fewer variables, and I
> > think it's easier to see what the loop is doing, i.e. that we advance
> > one bio_vec per iteration: in the existing code, it takes a bit of
> > thinking to see that we won't spend more than one iteration within the
> > same bio_vec.
> 
> Yeah, may be. It's more the matter of preference then. I don't think
> it's simpler, and performance is entirely depends on a compiler and 
> input. But, that's rather subjective and IMHO not worth of time.
> 
> Anyway, thanks for thinking this through!
> 

You don't find listing 1 simpler than listing 2? It does save one
register, as it doesn't have to keep track of done independently from
bytes. This is always going to be the case unless the compiler can
eliminate done by transforming Listing 2 into Listing 1. Unfortunately,
even if it gets much smarter, it's unlikely to be able to do that,
because they're equivalent only if there is no overflow, so it would
need to know that bytes + iter->bi_bvec_done cannot overflow, and that
iter->bi_bvec_done must be smaller than cur->bv_len initially.

Listing 1:

	bytes += iter->bi_bvec_done;
	while (bytes) {
		const struct bio_vec *cur = bv + idx;

		if (bytes < cur->bv_len)
			break;
		bytes -= cur->bv_len;
		idx++;
	}

	iter->bi_idx = idx;
	iter->bi_bvec_done = bytes;

Listing 2:

	while (bytes) {
		const struct bio_vec *cur = bv + idx;
		unsigned int len = min(bytes, cur->bv_len - done);

		bytes -= len;
		done += len;
		if (done == cur->bv_len) {
			idx++;
			done = 0;
		}
	}

	iter->bi_idx = idx;
	iter->bi_bvec_done = done;

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ