lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20221009175920.GA28685@1wt.eu>
Date:   Sun, 9 Oct 2022 19:59:20 +0200
From:   Willy Tarreau <w@....eu>
To:     Alexey Dobriyan <adobriyan@...il.com>
Cc:     lkp@...el.com, linux-kselftest@...r.kernel.org,
        linux-kernel@...r.kernel.org,
        "Paul E. McKenney" <paulmck@...nel.org>
Subject: Re: tools/nolibc: fix missing strlen() definition and infinite loop
 with gcc-12

Hi Alexey,

On Sun, Oct 09, 2022 at 06:45:49PM +0300, Alexey Dobriyan wrote:
> Willy Tarreau wrote:
> > +#if defined(__GNUC__) && (__GNUC__ >= 12)
> > +__attribute__((optimize("no-tree-loop-distribute-patterns")))
> > +#endif
> >  static __attribute__((unused))
> > -size_t nolibc_strlen(const char *str
> 
> I'd suggest to use asm("") in the loop body. It worked in the past
> to prevent folding division loop back into division instruction.

Ah excellent idea! I initially thought about using asm() to hide a
variable provenance but didn't like it much because it undermines
code optimization. But you're right, with an empty asm() statement
alone, the loop will not look like an strlen() anymore. Just tried
and it works like a charm, I'll resend a patch so that we can get
rid of the ugly ifdef.

> Or switch to 
> 
> 	size_t f(const char *s)
> 	{
> 		const char *s0 = s;
> 		while (*s++)
> 			;
> 		return s - s0 - 1;
> 	}
> 
> which compiles to 1 branch, not 2.

In fact it depends. In the original code that approach was part of
the ones I had considered, but it doesn't always in better code due
to the prologue and epilogue being larger. It's only better at -O1,
and -O2, but not -Os, and once you add asm() into it, only -O1
remains better:

  $ nm --size len.o|grep O|rev|sort|rev
  000000000000001a T len_while_O1
  0000000000000022 T len_while_asm_O1
  0000000000000026 T len_for_O1
  000000000000001a T len_while_O2
  000000000000002b T len_while_asm_O2
  0000000000000021 T len_for_O2
  0000000000000013 T len_while_Os
  0000000000000015 T len_while_asm_Os
  000000000000000e T len_for_Os

This observation seems consistent for me on x86_64, i386, arm and arm64.

> But of course they could recognise this pattern too.

Yes definitely, hence the need for asm() there as well to complete the
comparison.

Thanks for the suggestion, I'll send a v2 shortly.
Willy

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ