[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <29a8ceb4-a699-433b-8a11-be6b3c9fd045@rasmusvillemoes.dk>
Date: Mon, 10 Jun 2024 11:50:31 +0200
From: Rasmus Villemoes <linux@...musvillemoes.dk>
To: Linus Torvalds <torvalds@...ux-foundation.org>,
Peter Anvin <hpa@...or.com>, Ingo Molnar <mingo@...nel.org>,
Borislav Petkov <bp@...en8.de>, Thomas Gleixner <tglx@...utronix.de>,
Josh Poimboeuf <jpoimboe@...nel.org>
Cc: Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
the arch/x86 maintainers <x86@...nel.org>,
linux-arch <linux-arch@...r.kernel.org>
Subject: Re: [PATCH] x86: add 'runtime constant' infrastructure
On 08/06/2024 21.35, Linus Torvalds wrote:
> Needs more comments and testing, but it works, and has a generic
> fallback for architectures that don't support it.
>
> Signed-off-by: Linus Torvalds <torvalds@...ux-foundation.org>
> ---
>
> Notes from the first hack: I renamed the infrastructure from "static
> const" to "runtime const". We end up having a number of uses of "static
> const" that are related to the C language notion of "static const"
> variables or functions, and "runtime constant" is a bit more descriptive
> anyway.
>
> And this now is properly abstracted out, so that any architecture can
> choose to implement their own version, but it all falls back on "just
> use the variable".
Well, I think it lacks a little. I made it so that an arch didn't need
to implement support for all runtime_const_*() helpers (I'll use that
name because yes, much better than rai_), but could implement just
runtime_const_load() and not the _shift_right thing, and then the base
pointer would be loaded as an immediate, and the shift count would also
be loaded as an immediate, but into a register, and the actual shift
would be with the shift amount from that register, so no D$ access.
So I think it should be something like
#ifndef runtime_const_load(x)
#define runtime_const_load(x) (x)
#endif
#ifndef runtime_const_shift_right_32
#define runtime_const_shift_right_32(val, sym) ((u32)val >>
runtime_const_load(sym))
#endif
etc. (This assumes we add some type-generic runtime_const_load that can
be used both for pointers and ints; supporting sizeof() < 4 is probably
not relevant, but could also be done).
Otherwise, if we want to add some runtime_const_mask32() thing to
support hash tables where we use that model, one would have to hook up
support on all architectures at once. Similarly, architectures that are
late to the party won't need to do the full monty at once, just
implementing runtime_const_load() would be enough to get some of the win.
> Rasmus - I've cleaned up my patch a lot, and it now compiles fine on
> other architectures too, although obviously with the fallback of "no
> constant fixup". As a result, my patch is actually smaller and much
> cleaner, and I ended up liking my approach more than your RAI thing
> after all.
Yeah, it's nice and small. I was probably over-ambitious, as I also
wanted to, eventually, use runtime_const_load() for lots of the *_cachep
variables and similar. With my approach I didn't have to modify the
linker script every time a new use is introduced, and also didn't have
to modify the place that does the initialization - but it did come at
the cost of only patching everything at once at the end of init, thus
requiring the trampoline to do the loads from memory in the meantime.
But that was partly also for another too ambitious thing: eventually
allowing this to be used for __read_mostly and not just __ro_after_init
stuff (various run-time limits etc.).
>
> arch/x86/include/asm/runtime-const.h | 61 ++++++++++++++++++++++++++++
> arch/x86/kernel/vmlinux.lds.S | 3 ++
> fs/dcache.c | 24 +++++++----
> include/asm-generic/Kbuild | 1 +
> include/asm-generic/runtime-const.h | 15 +++++++
> include/asm-generic/vmlinux.lds.h | 8 ++++
> 6 files changed, 104 insertions(+), 8 deletions(-)
> create mode 100644 arch/x86/include/asm/runtime-const.h
> create mode 100644 include/asm-generic/runtime-const.h
>
> diff --git a/arch/x86/include/asm/runtime-const.h b/arch/x86/include/asm/runtime-const.h
> new file mode 100644
> index 000000000000..b4f7efc0a554
> --- /dev/null
> +++ b/arch/x86/include/asm/runtime-const.h
> @@ -0,0 +1,61 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ASM_RUNTIME_CONST_H
> +#define _ASM_RUNTIME_CONST_H
> +
> +#define runtime_const_ptr(sym) ({ \
> + typeof(sym) __ret; \
> + asm("mov %1,%0\n1:\n" \
> + ".pushsection runtime_ptr_" #sym ",\"a\"\n\t" \
> + ".long 1b - %c2 - .\n\t" \
> + ".popsection" \
> + :"=r" (__ret) \
> + :"i" ((unsigned long)0x0123456789abcdefull), \
> + "i" (sizeof(long))); \
> + __ret; })
> +
> +// The 'typeof' will create at _least_ a 32-bit type, but
> +// will happily also take a bigger type and the 'shrl' will
> +// clear the upper bits
> +#define runtime_const_shift_right_32(val, sym) ({ \
> + typeof(0u+(val)) __ret = (val); \
> + asm("shrl $12,%k0\n1:\n" \
> + ".pushsection runtime_shift_" #sym ",\"a\"\n\t" \
> + ".long 1b - 1 - .\n\t" \
> + ".popsection" \
> + :"+r" (__ret)); \
> + __ret; })
Don't you need a cc clobber here? And for both, I think asm_inline is
appropriate, as you really want to tell gcc that this just consists of a
single instruction, the .pushsection games notwithstanding.
I know it's a bit more typing, but the section names should be
"runtime_const_shift" etc., not merely "runtime_shift".
> +
> +#define runtime_const_init(type, sym, value) do { \
> + extern s32 __start_runtime_##type##_##sym[]; \
> + extern s32 __stop_runtime_##type##_##sym[]; \
> + runtime_const_fixup(__runtime_fixup_##type, \
> + (unsigned long)(value), \
> + __start_runtime_##type##_##sym, \
> + __stop_runtime_##type##_##sym); \
> +} while (0)
> +
> +/*
> + * The text patching is trivial - you can only do this at init time,
> + * when the text section hasn't been marked RO, and before the text
> + * has ever been executed.
> + */
> +static inline void __runtime_fixup_ptr(void *where, unsigned long val)
> +{
> + *(unsigned long *)where = val;
> +}
> +
> +static inline void __runtime_fixup_shift(void *where, unsigned long val)
> +{
> + *(unsigned char *)where = val;
> +}
> +
> +static inline void runtime_const_fixup(void (*fn)(void *, unsigned long),
> + unsigned long val, s32 *start, s32 *end)
> +{
> + while (start < end) {
> + fn(*start + (void *)start, val);
> + start++;
> + }
> +}
> +
> +#endif
> diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
> index 3509afc6a672..6e73403e874f 100644
> --- a/arch/x86/kernel/vmlinux.lds.S
> +++ b/arch/x86/kernel/vmlinux.lds.S
> @@ -357,6 +357,9 @@ SECTIONS
> PERCPU_SECTION(INTERNODE_CACHE_BYTES)
> #endif
>
> + RUNTIME_CONST(shift, d_hash_shift)
> + RUNTIME_CONST(ptr, dentry_hashtable)
> +
Hm, doesn't this belong in the common linker script? I mean, if arm64
were to implement support for this, they'd also have to add this
boilerplate to their vmlinux.lds.S? And then if another
RUNTIME_CONST(ptr, ...) use case is added that goes in all
at-that-point-in-time supporting architectures' vmlinux.lds.S. Doesn't
seem to scale.
> . = ALIGN(PAGE_SIZE);
>
> /* freed after init ends here */
> diff --git a/fs/dcache.c b/fs/dcache.c
> index 407095188f83..4511e557bf84 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -97,12 +97,14 @@ EXPORT_SYMBOL(dotdot_name);
> */
>
> static unsigned int d_hash_shift __ro_after_init;
> -
> static struct hlist_bl_head *dentry_hashtable __ro_after_init;
>
> -static inline struct hlist_bl_head *d_hash(unsigned int hash)
> +#include <asm/runtime-const.h>
> +
Please keep the #includes together at the top of the file.
> +static inline struct hlist_bl_head *d_hash(unsigned long hashlen)
> {
> - return dentry_hashtable + (hash >> d_hash_shift);
> + return runtime_const_ptr(dentry_hashtable) +
> + runtime_const_shift_right_32(hashlen, d_hash_shift);
> }
Could you spend some words on this signature change? Why should this now
(on 64BIT) take the full hash_len as argument, only to let the compiler
(with the (u32) cast) or cpu (in the x86_64 case, via the %k modifier if
I'm reading things right) only use the lower 32 bits?
The patch would be even smaller without this, and perhaps it could be
done as a separate patch if it does lead to (even) better code generation.
>
> static void __d_rehash(struct dentry *entry)
> {
> - struct hlist_bl_head *b = d_hash(entry->d_name.hash);
> + struct hlist_bl_head *b = d_hash(entry->d_name.hash_len);
>
> hlist_bl_lock(b);
> hlist_bl_add_head_rcu(&entry->d_hash, b);
> @@ -3129,6 +3131,9 @@ static void __init dcache_init_early(void)
> 0,
> 0);
> d_hash_shift = 32 - d_hash_shift;
> +
> + runtime_const_init(shift, d_hash_shift, d_hash_shift);
> + runtime_const_init(ptr, dentry_hashtable, dentry_hashtable);
> }
>
> static void __init dcache_init(void)
> @@ -3157,6 +3162,9 @@ static void __init dcache_init(void)
> 0,
> 0);
> d_hash_shift = 32 - d_hash_shift;
> +
> + runtime_const_init(shift, d_hash_shift, d_hash_shift);
> + runtime_const_init(ptr, dentry_hashtable, dentry_hashtable);
> }
>
Aside: That duplication makes me want to make dcache_init() grow an 'int
early' arg, change the body accordingly and change the call sites to
dcache_init(1) / dcache_init(0). But since the functions propably only
change once per decade or something like that, probably not worth it.
> +/*
> + * This is the fallback for when the architecture doesn't
> + * support the runtime const operations.
> + *
> + * We just use the actual symbols as-is.
> + */
> +#define runtime_const_ptr(sym) (sym)
> +#define runtime_const_shift_right_32(val, sym) ((u32)(val)>>(sym))
> +#define runtime_const_init(type,sym,value) do { } while (0)
Hm, I wonder if there's ever a case where you want to patch using any
other value than what you've just assigned to sym. IOW, why doesn't
runtime_const_init just take type,sym args?
It can't be to support some 'struct global { void *this; } g' where you
then want to use the identifier g_this for the tables and g.this for the
value, 'cause that wouldn't work with the fallbacks vs the x86
implementation - one would need runtime_const_ptr(g.this) and the other
runtime_const_ptr(g_this) at the use sites.
Rasmus
Powered by blists - more mailing lists