[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZmiN_7LMp2fbKhIw@J2N7QTR9R3>
Date: Tue, 11 Jun 2024 18:48:47 +0100
From: Mark Rutland <mark.rutland@....com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Peter Anvin <hpa@...or.com>, Ingo Molnar <mingo@...nel.org>,
Borislav Petkov <bp@...en8.de>,
Thomas Gleixner <tglx@...utronix.de>,
Rasmus Villemoes <linux@...musvillemoes.dk>,
Josh Poimboeuf <jpoimboe@...nel.org>,
Catalin Marinas <catalin.marinas@....com>,
Will Deacon <will@...nel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
the arch/x86 maintainers <x86@...nel.org>,
linux-arm-kernel@...ts.infradead.org,
linux-arch <linux-arch@...r.kernel.org>
Subject: Re: [PATCH 4/7] arm64: add 'runtime constant' support
On Tue, Jun 11, 2024 at 09:56:17AM -0700, Linus Torvalds wrote:
> On Tue, 11 Jun 2024 at 07:29, Mark Rutland <mark.rutland@....com> wrote:
> >
> > Do we expect to use this more widely? If this only really matters for
> > d_hash() it might be better to handle this via the alternatives
> > framework with callbacks and avoid the need for new infrastructure.
>
> Hmm. The notion of a callback for alternatives is intriguing and would
> be very generic, but we don't have anything like that right now.
>
> Is anybody willing to implement something like that? Because while I
> like the idea, it sounds like a much bigger change.
Fair enough if that's a pain on x86, but we already have them on arm64, and
hence using them is a smaller change there. We already have a couple of cases
which uses MOVZ;MOVK;MOVK;MOVK sequence, e.g.
// in __invalidate_icache_max_range()
asm volatile(ALTERNATIVE_CB("movz %0, #0\n"
"movk %0, #0, lsl #16\n"
"movk %0, #0, lsl #32\n"
"movk %0, #0, lsl #48\n",
ARM64_ALWAYS_SYSTEM,
kvm_compute_final_ctr_el0)
: "=r" (ctr));
... which is patched via the callback:
void kvm_compute_final_ctr_el0(struct alt_instr *alt,
__le32 *origptr, __le32 *updptr, int nr_inst)
{
generate_mov_q(read_sanitised_ftr_reg(SYS_CTR_EL0),
origptr, updptr, nr_inst);
}
... where the generate_mov_q() helper does the actual instruction generation.
So if we only care about a few specific constants, we could give them their own
callbacks, like kvm_compute_final_ctr_el0() above.
[...]
> > We have some helpers for instruction manipulation, and we can use
> > aarch64_insn_encode_immediate() here, e.g.
> >
> > #include <asm/insn.h>
> >
> > static inline void __runtime_fixup_16(__le32 *p, unsigned int val)
> > {
> > u32 insn = le32_to_cpu(*p);
> > insn = aarch64_insn_encode_immediate(AARCH64_INSN_IMM_16, insn, val);
> > *p = cpu_to_le32(insn);
> > }
>
> Ugh. I did that, and then noticed that it makes the generated code
> about ten times bigger.
>
> That interface looks positively broken.
>
> There is absolutely nobody who actually wants a dynamic argument, so
> it would have made both the callers and the implementation *much*
> simpler had the "AARCH64_INSN_IMM_16" been encoded in the function
> name the way I did it for my instruction rewriting.
>
> It would have made the use of it simpler, it would have avoided all
> the "switch (type)" garbage, and it would have made it all generate
> much better code.
Oh, completely agreed. FWIW, I have better versions sat in my
arm64/insn/rework branch, but I haven't had the time to get all the rest
of the insn framework cleanup sorted:
https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/commit/?h=arm64/insn/rework&id=9cf0ec088c9d5324c60933bf3924176fea0a4d0b
I can go prioritise getting that bit out if it'd help, or I can clean
this up later.
Those allow the compiler to do much better, including compile-time (or
runtime) checks that immediates fit. For example:
void encode_imm16(__le32 *p, u16 imm)
{
u32 insn = le32_to_cpu(*p);
// Would warn if 'imm' were u32.
// As u16 always fits, no warning
BUILD_BUG_ON(!aarch64_insn_try_encode_unsigned_imm16(&insn, imm));
*p = cpu_to_le32(insn);
}
... compiles to:
<encode_imm16>:
ldr w2, [x0]
bfi w2, w1, #5, #16
str w2, [x0]
ret
... which I think is what you want?
> So I did that change you suggested, and then undid it again.
>
> Because that whole aarch64_insn_encode_immediate() thing is an
> abomination, and should be burned at the stake. It's misdesigned in
> the *worst* possible way.
>
> And no, this code isn't performance-critical, but I have some taste,
> and the code I write will not be using that garbage.
Fair enough.
Mark.
Powered by blists - more mailing lists