[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200723152101.GI2786714@ZenIV.linux.org.uk>
Date: Thu, 23 Jul 2020 16:21:01 +0100
From: Al Viro <viro@...iv.linux.org.uk>
To: David Laight <David.Laight@...lab.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-arch@...r.kernel.org" <linux-arch@...r.kernel.org>
Subject: Re: [PATCH 04/18] csum_and_copy_..._user(): pass 0xffffffff instead
of 0 as initial sum
On Thu, Jul 23, 2020 at 03:53:42PM +0100, Al Viro wrote:
> Said that, what you've printed for 1-byte segments (and that's going to be
> seriously affected by the setup costs in csum-copy.S, sensitive to calling
> convention changes) is time to run the 16-iteration loop divided by 1 * 16 / 8;
> IOW, your difference for 16 iterations here is 37*2 = 74 cycles. With
> per-iteration diff being a bit under 5 cycles. Which is not implausible,
> but
> 1) extrapolating to other compiler versions, flags, etc. is not obvious
> 2) the effects of calling convention changes need to be taken into account
> 3) for copying to/from userland the effects of calling convention changes
> are be even larger, and kernel is certainly not going to issue kvec iters of _that_
> sort, TYVM.
To clarify it a bit: the effects of calling conventions change are mostly due
to not passing (and saving) those error pointers, and that could be had with
"pass the initial sum in" - just start these iov_iter.c loops with sum = ~0U
and we get the same warranties re not getting 0 in absence of faults.
The point is, your "~4.5 cycles per vector" is pretty much noise and the
difference between the 3-argument and 4-argument variants could easily be
in the same range. It might be a valid microoptimization, it might be not.
3-argument variant is simpler and IMO in absence of strong data we ought
to go with that.
Powered by blists - more mailing lists