lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 18 Mar 2022 23:52:01 +0000
From:   David Laight <David.Laight@...LAB.COM>
To:     'Segher Boessenkool' <segher@...nel.crashing.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>
CC:     Andrew Cooper <Andrew.Cooper3@...rix.com>,
        Nick Desaulniers <ndesaulniers@...gle.com>,
        "H. Peter Anvin" <hpa@...or.com>, Bill Wendling <morbo@...gle.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        "maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)" <x86@...nel.org>,
        Nathan Chancellor <nathan@...nel.org>,
        "Juergen Gross" <jgross@...e.com>,
        Peter Zijlstra <peterz@...radead.org>,
        "Andy Lutomirski" <luto@...nel.org>,
        "llvm@...ts.linux.dev" <llvm@...ts.linux.dev>,
        LKML <linux-kernel@...r.kernel.org>,
        linux-toolchains <linux-toolchains@...r.kernel.org>
Subject: RE: [PATCH v5] x86: use builtins to read eflags

From: Segher Boessenkool
> Sent: 18 March 2022 23:04
...
> The vast majority of compiler builtins are for simple transformations
> that the machine can do, for example with vector instructions.  Using
> such builtins does *not* instruct the compiler to use those machine
> insns, even if the builtin name would suggest that; instead, it asks to
> have code generated that has such semantics.  So it can be optimised by
> the compiler, much more than what can be done with inline asm.

Bah.
I wrote some small functions to convert blocks of 80 audio
samples between C 'float' and the 8-bit u-law and A-law floating
point formats - one set use the F16C conversions for denormalised
values.
I really want the instructions I've asked for in the order
I've asked for them.
I don't want the compiler doing stupid things.
(Like deciding to try to vectorise the bit of code at the end
that handled non 80 byte blocks.)

> It also can be optimised better by the compiler than if you would
> open-code the transforms (if you ask to frobnicate something, the
> compiler will know you want to frobnicate that thing, and it will not
> always recognise that is what you want if you just write it out in more
> general code).

Yep.
If I write 'for (i = 0; i < n; i++) foo[i] = bar[i]'
I want a loop - not a call to memcpy().
If I want a memcpy() I'll call memcpy().

And if I write:
	do {
		sum64a += buff32[0];
		sum64b += buff32[1];
		sum64a += buff32[2];
		sum64b += buff32[3];
		buff += 4;
	} while (buff != lim);
I don't want to see 'buff[1] + buff[2]' anywhere!
That loop has half a chance of running at 8 bytes/clock.
But not how gcc compiles it.

> Well-chosen builtin names are also much more readable than the best
> inline asm can ever be, and it can express much more in a much smaller
> space, without so much opportunity to make mistakes, either.

Hmmm...
Trying to write that SSE2/AVX code was a nightmare.
Chase through the cpu instruction set trying to sort out
the name of the required instruction.
Then search through the 'intrinsic' header to find the
name of the builtin.
Then disassemble the code to check the I'd got the right one.
I'm pretty sure the asm would have been shorter
and needed just as many comments.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ