lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 7 Jul 2020 12:35:16 +0000
From:   David Laight <David.Laight@...LAB.COM>
To:     'Al Viro' <viro@...iv.linux.org.uk>,
        Linus Torvalds <torvalds@...ux-foundation.org>
CC:     Michael Ellerman <mpe@...erman.id.au>,
        Christophe Leroy <christophe.leroy@....fr>,
        Josh Poimboeuf <jpoimboe@...hat.com>,
        "Peter Zijlstra" <peterz@...radead.org>,
        the arch/x86 maintainers <x86@...nel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: RE: objtool clac/stac handling change..

From: Al Viro
> Sent: 04 July 2020 03:12
...
> BTW, looking at csum_and_copy_{to,from}_user() callers (all 3 of them,
> all in lib/iov_iter.c) we have this:
> 	1) len is never 0
> 	2) sum (initial value of csum) is always 0
> 	3) failure (reported via *err_ptr) is always treateds as "discard
> the entire iovec segment (and possibly the entire iovec)".  Exact value
> put into *err_ptr doesn't matter (it's only compared to 0) and in case of
> error the return value is ignored.
> 
> Now, using ~0U instead of 0 for initial sum would yield an equivalent csum
> (comparable modulo 2^16-1) *AND* never yield 0 (recall how csum addition works).
> 
> IOW, we could simply return 0 to indicate an error.  Which gives much saner
> calling conventions:
> __wsum csum_and_copy_from_user(const void __user *src, void *dst, int len)
> copying the damn thing and returning 0 on error or a non-zero value comparable
> to csum of the data modulo 2^16-1 on success.  Same for csum_and_copy_to_user()
> (modulo const and __user being on the other argument).
> 
> For x86 it simplifies the instances (both the inline wrappers and asm parts);
> I hadn't checked the other architectures yet, but it looks like that should
> be doable for all architectures.  And it does simplify the callers...

All the csum functions should let the caller pass in a small value
to be added in (could be over 2^32 on 64 bit systems) since that is
normally 'free' in the algorithm - certainly better than adding it
it at the end - which is what the current x86 code does.
(For 64bit systems the 'small' value can exceed 2^32.)

I also wonder if the csum_and_copy() functions are actually worthwhile on x86.
The csum code can run at 8 bytes/clock on all Intel cpu since ivy bridge.
(It doesn't, it only does 4 bytes/clock until (IIRC) Haswell [1].)
On cpu that support ADCX/ADOX you may do better - probably 12 bytes/clock,
I think 16 bytes/clock is wishful thinking.
But there is no leeway for any extra uops in either case.

However trying to get a memory read, memory write, adc and bits of loop
control scheduled in one clock is probably impossible - even though
it might not exceed the number of uops the execution pipelines can process.
ISTR that just separating the memory read from the adc slows
thing down too much - probably issues with retiring instructions.
So I don't think it can get near 8 bytes/clock.

OTOH a copy operation trivially does 8 bytes/clock.
I even think 'rep movsq' is faster - never mind the fast 'rep movsb'.

So separate copy and checksum passes should easily exceed 4 bytes/clock,
but I suspect that doing them together never does.
(Unless the buffer is too big for the L1 cache.)

[1] The underlying problem is that a uop can only have 2 inputs.
ADC needs three (two values and the carry flag).
So the ADC instruction takes two clocks.
>From ivy bridge (sandy?) the carry flag is available early,
so adding to alternate registers lets you do 1 per clock.
So the existing csum function is rather slower than adding
32bit values to a 64bit register on most older cpus.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ