[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMj1kXGLXGqA4JvuWNLk_-cu5_soKpbyDaZxwjOo5LVTdZt71w@mail.gmail.com>
Date: Mon, 29 Jul 2024 09:12:01 +0200
From: Ard Biesheuvel <ardb@...nel.org>
To: Calvin Owens <calvin@...nvd.org>
Cc: Arnd Bergmann <arnd@...db.de>, Nathan Chancellor <nathan@...nel.org>,
clang-built-linux <clang-built-linux@...glegroups.com>, zhuqiuer <zhuqiuer1@...wei.com>,
linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
Russell King <linux@...linux.org.uk>, Linus Walleij <linus.walleij@...aro.org>
Subject: Re: [PATCH] ARM: vfp: Use asm volatile in fmrx/fmxr macros
On Sun, 28 Jul 2024 at 23:29, Calvin Owens <calvin@...nvd.org> wrote:
>
> On Sunday 07/28 at 19:09 +0200, Ard Biesheuvel wrote:
> > (cc Arnd, Nathan, zhuqiuer)
> >
> > On Sun, 28 Jul 2024 at 10:21, Calvin Owens <calvin@...nvd.org> wrote:
> > > <snip>
> > >
> > > Crudely grepping for vmsr/vmrs instructions in the otherwise nearly
> > > idential text for vfp_support_entry() makes the problem obvious:
> > >
> > > vmlinux.llvm.good [0xc0101cb8] <+48>: vmrs r7, fpexc
> > > vmlinux.llvm.good [0xc0101cd8] <+80>: vmsr fpexc, r0
> > > vmlinux.llvm.good [0xc0101d20] <+152>: vmsr fpexc, r7
> > > vmlinux.llvm.good [0xc0101d38] <+176>: vmrs r4, fpexc
> > > vmlinux.llvm.good [0xc0101d6c] <+228>: vmrs r0, fpscr
> > > vmlinux.llvm.good [0xc0101dc4] <+316>: vmsr fpexc, r0
> > > vmlinux.llvm.good [0xc0101dc8] <+320>: vmrs r0, fpsid
> > > vmlinux.llvm.good [0xc0101dcc] <+324>: vmrs r6, fpscr
> > > vmlinux.llvm.good [0xc0101e10] <+392>: vmrs r10, fpinst
> > > vmlinux.llvm.good [0xc0101eb8] <+560>: vmrs r10, fpinst2
> > >
> > > vmlinux.llvm.bad [0xc0101cb8] <+48>: vmrs r7, fpexc
> > > vmlinux.llvm.bad [0xc0101cd8] <+80>: vmsr fpexc, r0
> > > vmlinux.llvm.bad [0xc0101d20] <+152>: vmsr fpexc, r7
> > > vmlinux.llvm.bad [0xc0101d30] <+168>: vmrs r0, fpscr
> > > vmlinux.llvm.bad [0xc0101d50] <+200>: vmrs r6, fpscr <== BOOM!
> > > vmlinux.llvm.bad [0xc0101d6c] <+228>: vmsr fpexc, r0
> > > vmlinux.llvm.bad [0xc0101d70] <+232>: vmrs r0, fpsid
> > > vmlinux.llvm.bad [0xc0101da4] <+284>: vmrs r10, fpinst
> > > vmlinux.llvm.bad [0xc0101df8] <+368>: vmrs r4, fpexc
> > > vmlinux.llvm.bad [0xc0101e5c] <+468>: vmrs r10, fpinst2
> > >
> > > I think LLVM's reordering is valid as the code is currently written: the
> > > compiler doesn't know the instructions have side effects in hardware.
> > >
> > > Fix by using "asm volatile" in fmxr() and fmrx(), so they cannot be
> > > reordered with respect to each other. The original compiler now produces
> > > working kernels on my hardware with DYNAMIC_DEBUG=n.
> > >
> > > This is the relevant piece of the diff of the vfp_support_entry() text,
> > > from the original oopsing kernel to a working kernel with this patch:
> > >
> > > vmrs r0, fpscr
> > > tst r0, #4096
> > > bne 0xc0101d48
> > > tst r0, #458752
> > > beq 0xc0101ecc
> > > orr r7, r7, #536870912
> > > ldr r0, [r4, #0x3c]
> > > mov r9, #16
> > > -vmrs r6, fpscr
> > > orr r9, r9, #251658240
> > > add r0, r0, #4
> > > str r0, [r4, #0x3c]
> > > mvn r0, #159
> > > sub r0, r0, #-1207959552
> > > and r0, r7, r0
> > > vmsr fpexc, r0
> > > vmrs r0, fpsid
> > > +vmrs r6, fpscr
> > > and r0, r0, #983040
> > > cmp r0, #65536
> > > bne 0xc0101d88
> > >
> > > Fixes: 4708fb041346 ("ARM: vfp: Reimplement VFP exception entry in C code")
> > > Signed-off-by: Calvin Owens <calvin@...nvd.org>
> > > ---
> > > arch/arm/vfp/vfpinstr.h | 48 ++++++++++++++++++++++-------------------
> > > 1 file changed, 26 insertions(+), 22 deletions(-)
> > >
> >
> > Thanks for the patch, and for the excellent analysis.
> >
> > Note that this fix has been proposed in the past, as well as another
> > one addressing the same issue, but we've been incredibly sloppy
> > getting it merged.
> >
> > https://lore.kernel.org/linux-arm-kernel/20240410024126.21589-2-zhuqiuer1@huawei.com/
> > https://lore.kernel.org/linux-arm-kernel/20240318093004.117153-2-ardb+git@google.com/
>
> Ah sorry for missing that, I searched for the symptom not the cure.
>
> > What both of us appear to have missed is that there are two versions
> > of these routines, which should either be dropped (as they are
> > obsolete now that the minimum binutils version is 2.25) or fixed up as
> > well, as you do below.
> >
> > Anyone have any thoughts on using a memory clobber as opposed to
> > volatile? I think volatile means that the access cannot be elided, but
> > it is unclear to me whether that implies any ordering. A 'memory'
> > clobber implies that globally visible state is updated, which seems
> > like a stronger guarantee to me.
>
> My thinking was that if 'asm volatile' is sufficient, it will suppress
> less optimization than the clobber, since the clobber might force the
> compiler to assume unrelated memory could have been modified when it
> really never is. But I'm not sure about that.
>
> Out of curiousity, I tried it both ways with the same compiler just now,
> the only tiny difference in the emitted vfp_support_entry() is here:
>
> --- /volatile 2024-07-28 13:28:59.890505404 -0700
> +++ /memclobber 2024-07-28 13:28:59.890505404 -0700
> str r0, [r5, #0x4]
> vmrs r7, fpexc
> tst r7, #1073741824
> bne 0xc0101d28
> orr r7, r7, #1073741824
> bic r0, r7, #-2147483648
> vmsr fpexc, r0
> +ldr r8, [pc, #0x26c]
> ldr r0, [r5, #0x8]
> -ldr r8, [pc, #0x268]
> add r6, r5, #224
> ldr r0, [r8, r0, lsl #2]
> cmp r0, r6
> beq 0xc0101d18
>
Right. So it doesn't matter in practice - good to know.
So in the 'volatile' case, I guess we are relying on the compiler not
reordering those with respect to each other, which should be
sufficient to ensure that we do not access VFP status register before
enabling the VFP unit via the control register, as all are accessed
using inline asm.
In any case, this should go into the patch system.
https://www.armlinux.org.uk/developer/patches/section.php?section=0
Note to self and other: follow-up with a patch that removes
CONFIG_AS_VFP_VMRS_FPINST, which is no longer needed.
Powered by blists - more mailing lists