[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20200325075924.109330-1-courbet@google.com>
Date: Wed, 25 Mar 2020 08:59:22 +0100
From: Clement Courbet <courbet@...gle.com>
To: unlisted-recipients:; (no To-header on input)
Cc: Nathan Chancellor <natechancellor@...il.com>,
Kees Cook <keescook@...omium.org>,
Nick Desaulniers <ndesaulniers@...gle.com>,
Joe Perches <joe@...ches.com>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
"H. Peter Anvin" <hpa@...or.com>, x86@...nel.org,
Clement Courbet <courbet@...gle.com>,
linux-kernel@...r.kernel.org, clang-built-linux@...glegroups.com
Subject: [PATCH] x86: Alias memset to __builtin_memset.
> Not sure we should modify this for someone's downstream tree to work
> around an issue they introduced. If you want to file a bug
> internally, I'd be happy to comment on it.
I don't have a strong opinion on that. I don't know the policy of the
linux kernel w.r.t. this. There is an internal bug for this, where
kernel maintainers suggested I upstream this patch.
> Does __builtin_memset detect support for `rep stosb`, then patch the
> kernel to always use it or not?
__builtin_memset just allows the compiler to recognize that this has the
semantics of a memset, exactly like it happens when -freestanding is not
specified.
In terms of what compilers do when expanding the memset, it depends on
the size.
gcc or clang obviously do not generate vector code when -no-sse is
specified, as this would break promises.
clang inlines stores for small sizes and switches to memset as the size
increases: https://godbolt.org/z/_X16xt
gcc inlines stores for tiny sizes, then switches to repstos for larger
sizes: https://godbolt.org/z/m-G233 This behaviour is additionally
configurable through command line flags: https://godbolt.org/z/wsoJpQ
> Did you verify what this change does for other compilers?
Are there other compilers are used to build the kernel on x86 ?
icc does the same as gcc and clang: https://godbolt.org/z/yCju_D
> yet it doesn't use vector operations for the general case
I'm not sure how vector operations relate to freestanding, or this change.
> Adding -ffreestanding to optimize one hot memset in
> one function is using a really big hammer to solve the wrong
> problem.
-ffreestanding was not added to optimize anything. It was already there.
If anything -ffreestanding actually pessimizes a lot of the code
generation, as it prevents the compiler from recognizing the semantics
of common primitives. This is what this change is trying to fix.
Removing -ffreestanding from the options is another (easier?) way to fix
the problem. This change is a stab at accomodating both those who want
freestanding and those who want performance.
The hot memset I mentionned was just the hottest one. But as you can imagine
there are many constant-size memsets spread across many functions, some of
which are also very hot, the others constituting a long tail which is not
negligible.
Powered by blists - more mailing lists