lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 5 Aug 2020 09:03:07 -0500
From:   Segher Boessenkool <segher@...nel.crashing.org>
To:     Christophe Leroy <christophe.leroy@...roup.eu>
Cc:     Benjamin Herrenschmidt <benh@...nel.crashing.org>,
        Paul Mackerras <paulus@...ba.org>,
        Michael Ellerman <mpe@...erman.id.au>, nathanl@...ux.ibm.com,
        anton@...abs.org, linux-arch@...r.kernel.org, arnd@...db.de,
        linux-kernel@...r.kernel.org, luto@...nel.org, tglx@...utronix.de,
        vincenzo.frascino@....com, linuxppc-dev@...ts.ozlabs.org
Subject: Re: [PATCH v10 2/5] powerpc/vdso: Prepare for switching VDSO to generic C implementation.

Hi!

On Wed, Aug 05, 2020 at 07:09:23AM +0000, Christophe Leroy wrote:
> Provide vdso_shift_ns(), as the generic x >> s gives the following
> bad result:
> 
>   18:	35 25 ff e0 	addic.  r9,r5,-32
>   1c:	41 80 00 10 	blt     2c <shift+0x14>
>   20:	7c 64 4c 30 	srw     r4,r3,r9
>   24:	38 60 00 00 	li      r3,0
> ...
>   2c:	54 69 08 3c 	rlwinm  r9,r3,1,0,30
>   30:	21 45 00 1f 	subfic  r10,r5,31
>   34:	7c 84 2c 30 	srw     r4,r4,r5
>   38:	7d 29 50 30 	slw     r9,r9,r10
>   3c:	7c 63 2c 30 	srw     r3,r3,r5
>   40:	7d 24 23 78 	or      r4,r9,r4
> 
> In our case the shift is always <= 32. In addition,  the upper 32 bits
> of the result are likely nul. Lets GCC know it, it also optimises the
> following calculations.
> 
> With the patch, we get:
>    0:	21 25 00 20 	subfic  r9,r5,32
>    4:	7c 69 48 30 	slw     r9,r3,r9
>    8:	7c 84 2c 30 	srw     r4,r4,r5
>    c:	7d 24 23 78 	or      r4,r9,r4
>   10:	7c 63 2c 30 	srw     r3,r3,r5

See below.  Such code is valid on PowerPC for all shift < 64, and a
future version of GCC will do that (it is on various TODO lists, it is
bound to happen *some* day ;-), but it won't help you yet of course).


> +/*
> + * The macros sets two stack frames, one for the caller and one for the callee
> + * because there are no requirement for the caller to set a stack frame when
> + * calling VDSO so it may have omitted to set one, especially on PPC64
> + */

If the caller follows the ABI, there always is a stack frame.  So what
is going on?


> +/*
> + * powerpc specific delta calculation.
> + *
> + * This variant removes the masking of the subtraction because the
> + * clocksource mask of all VDSO capable clocksources on powerpc is U64_MAX
> + * which would result in a pointless operation. The compiler cannot
> + * optimize it away as the mask comes from the vdso data and is not compile
> + * time constant.
> + */

It cannot optimise it because it does not know shift < 32.  The code
below is incorrect for shift equal to 32, fwiw.

> +static __always_inline u64 vdso_calc_delta(u64 cycles, u64 last, u64 mask, u32 mult)
> +{
> +	return (cycles - last) * mult;
> +}
> +#define vdso_calc_delta vdso_calc_delta
> +
> +#ifndef __powerpc64__
> +static __always_inline u64 vdso_shift_ns(u64 ns, unsigned long shift)
> +{
> +	u32 hi = ns >> 32;
> +	u32 lo = ns;
> +
> +	lo >>= shift;
> +	lo |= hi << (32 - shift);
> +	hi >>= shift;


> +	if (likely(hi == 0))
> +		return lo;

Removing these two lines shouldn't change generated object code?  Or not
make it worse, at least.

> +	return ((u64)hi << 32) | lo;
> +}


What does the compiler do for just

static __always_inline u64 vdso_shift_ns(u64 ns, unsigned long shift)
	return ns >> (shift & 31);
}

?


Segher

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ