lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 10 Oct 2022 15:34:54 -0700
From:   Nick Desaulniers <ndesaulniers@...gle.com>
To:     Arnd Bergmann <arnd@...nel.org>
Cc:     clang-built-linux <llvm@...ts.linux.dev>,
        Nathan Chancellor <nathan@...nel.org>,
        Miguel Ojeda <ojeda@...nel.org>,
        Ard Biesheuvel <ardb@...nel.org>, Gary Guo <gary@...yguo.net>,
        Russell King <linux@...linux.org.uk>,
        Linux ARM <linux-arm-kernel@...ts.infradead.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Craig Topper <craig.topper@...ive.com>,
        Philip Reames <listmail@...lipreames.com>, jh@...user.us
Subject: Re: [PATCH] arm: lib: implement aeabi_uldivmod via div64_u64_rem

On Mon, Oct 10, 2022 at 3:14 PM Arnd Bergmann <arnd@...nel.org> wrote:
>
> On Mon, Oct 10, 2022, at 11:23 PM, Nick Desaulniers wrote:
> > On Sat, Jul 16, 2022 at 2:47 AM Arnd Bergmann <arnd@...nel.org> wrote:
> >> On Sat, Jul 16, 2022 at 2:16 AM Nick Desaulniers <ndesaulniers@...gle.com> wrote:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm/nwfpe/softfloat.c#n2312
> > Any creative ideas on how to avoid this? Perhaps putting the `aSig -=
> > bSig;` in inline asm? Inserting a `barrier()` or empty asm statement
> > into the loops also seems to work.
>
> I was going to suggest adding a barrier() as well, should have
> read on first ;-)

barrier() forces reloads+spills in the loop.  The output with `-mllvm
-replexitval=never` is optimal (assuming the loop is faster than
__aeabi_uldivmod (which I think is unprovable).
https://godbolt.org/z/7dMabYYcM

As much I hate relying on compiler-internal flags, I think this is optimal:
```
diff --git a/arch/arm/nwfpe/Makefile b/arch/arm/nwfpe/Makefile
index 303400fa2cdf..2aec85ab1e8b 100644
--- a/arch/arm/nwfpe/Makefile
+++ b/arch/arm/nwfpe/Makefile
@@ -11,3 +11,9 @@ nwfpe-y                               += fpa11.o
fpa11_cpdo.o fpa11_cpdt.o \
                                   entry.o

 nwfpe-$(CONFIG_FPE_NWFPE_XP)   += extended_cpdo.o
+
+# Try really hard to avoid generating calls to __aeabi_uldivmod() from
+# float64_rem() due to loop elision.
+ifdef CONFIG_CC_IS_CLANG
+CFLAGS_softfloat.o     += -mllvm -replexitval=never
+endif
```

Part of me is tempted to move float64_rem() to its own file for that
flag, but indvars+loop-utils isn't eliding other loops in that file
(comparing the full disassembly before+after the above diff).

Long term, it might be nice for us to have `--rtlib` recognize
`--rtlib=linux-kernel@...sion` or something so that we could better
describe the effective compiler runtime to the compiler.  There are
already differences in compiler-rt and libgcc where we could make
better codegen decisions if we were to consider the target rtlib.
These libraries also change over time though...
-- 
Thanks,
~Nick Desaulniers

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ