[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20200819190619.GA4135577@rani.riverdale.lan>
Date: Wed, 19 Aug 2020 15:06:19 -0400
From: Arvind Sankar <nivedita@...m.mit.edu>
To: Nick Desaulniers <ndesaulniers@...gle.com>
Cc: Arvind Sankar <nivedita@...m.mit.edu>,
Andrew Morton <akpm@...ux-foundation.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
LKML <linux-kernel@...r.kernel.org>,
clang-built-linux <clang-built-linux@...glegroups.com>
Subject: Re: [PATCH v2] lib/string.c: Use freestanding environment
On Wed, Aug 19, 2020 at 11:35:11AM -0700, Nick Desaulniers wrote:
> On Wed, Aug 19, 2020 at 7:08 AM Arvind Sankar <nivedita@...m.mit.edu> wrote:
> >
> > gcc can transform the loop in a naive implementation of memset/memcpy
> > etc into a call to the function itself. This optimization is enabled by
> > -ftree-loop-distribute-patterns.
> >
> > This has been the case for a while (see eg [0]), but gcc-10.x enables
> > this option at -O2 rather than -O3 as in previous versions.
> >
> > Add -ffreestanding, which implicitly disables this optimization with
> > gcc. It is unclear whether clang performs such optimizations, but
> > hopefully it will also not do so in a freestanding environment.
> >
> > [0] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56888
> >
> > Signed-off-by: Arvind Sankar <nivedita@...m.mit.edu>
>
> For Clang:
> For x86_64 defconfig:
> This results in no change for the code generated.
>
> For aarch64 defconfig:
> This results in calls to bcmp being replaced with calls to memcmp in
> strstr and strnstr. I plan on adding -fno-built-bcmp then removing
> bcmp anyways. Not a bug either way, just noting the difference is
> disassembly.
>
> For arm defconfig:
> This results in no change for the code generated.
>
> I should check the other architectures we support, but my local build
> doesn't have all backends enabled currently; we'll catch it once it's
> being testing in -next if it's an issue, but I don't foresee it
> (knocks on wood, famous last words, ...)
>
> If it helps GCC not optimize these core functions into infinite
> recursion, I'm for that, especially since I'd bet these get called
> frequently and early on in boot, which is my least favorite time to
> debug.
>
> Reviewed-by: Nick Desaulniers <ndesaulniers@...gle.com>
> Tested-by: Nick Desaulniers <ndesaulniers@...gle.com>
>
I verified that arch/c6x with gcc-10 downloaded from kernel.org has the
broken memset with CC_OPTIMIZE_FOR_PERFORMANCE and gets fixed with this
patch. The default is optimize for size though, which doesn't seem to be
busted.
Thanks.
Powered by blists - more mailing lists