linux-kernel - Re: [PATCH 04/18] csum_and_copy_..._user(): pass 0xffffffff instead of 0 as initial sum

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200723152101.GI2786714@ZenIV.linux.org.uk>
Date:   Thu, 23 Jul 2020 16:21:01 +0100
From:   Al Viro <viro@...iv.linux.org.uk>
To:     David Laight <David.Laight@...lab.com>
Cc:     Linus Torvalds <torvalds@...ux-foundation.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-arch@...r.kernel.org" <linux-arch@...r.kernel.org>
Subject: Re: [PATCH 04/18] csum_and_copy_..._user(): pass 0xffffffff instead
 of 0 as initial sum

On Thu, Jul 23, 2020 at 03:53:42PM +0100, Al Viro wrote:

> Said that, what you've printed for 1-byte segments (and that's going to be
> seriously affected by the setup costs in csum-copy.S, sensitive to calling
> convention changes) is time to run the 16-iteration loop divided by 1 * 16 / 8;
> IOW, your difference for 16 iterations here is 37*2 = 74 cycles.  With
> per-iteration diff being a bit under 5 cycles.  Which is not implausible,
> but
> 	1) extrapolating to other compiler versions, flags, etc. is not obvious
> 	2) the effects of calling convention changes need to be taken into account
> 	3) for copying to/from userland the effects of calling convention changes
> are be even larger, and kernel is certainly not going to issue kvec iters of _that_
> sort, TYVM.

To clarify it a bit: the effects of calling conventions change are mostly due
to not passing (and saving) those error pointers, and that could be had with
"pass the initial sum in" - just start these iov_iter.c loops with sum = ~0U
and we get the same warranties re not getting 0 in absence of faults.

The point is, your "~4.5 cycles per vector" is pretty much noise and the
difference between the 3-argument and 4-argument variants could easily be
in the same range.  It might be a valid microoptimization, it might be not.
3-argument variant is simpler and IMO in absence of strong data we ought
to go with that.