[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Yj3OEI+WHV/A5uf8@hirez.programming.kicks-ass.net>
Date: Fri, 25 Mar 2022 15:13:36 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Mark Rutland <mark.rutland@....com>
Cc: Nick Desaulniers <ndesaulniers@...gle.com>,
Borislav Petkov <bp@...en8.de>,
Nathan Chancellor <nathan@...nel.org>, x86-ml <x86@...nel.org>,
lkml <linux-kernel@...r.kernel.org>, llvm@...ts.linux.dev,
Josh Poimboeuf <jpoimboe@...hat.com>,
linux-toolchains@...r.kernel.org
Subject: Re: clang memcpy calls
+linux-toolchains
On Fri, Mar 25, 2022 at 12:15:28PM +0000, Mark Rutland wrote:
> On Thu, Mar 24, 2022 at 11:43:46AM -0700, Nick Desaulniers wrote:
> > On Thu, Mar 24, 2022 at 4:19 AM Borislav Petkov <bp@...en8.de> wrote:
> > >
> > > Hi folks,
> > >
> > > so I've been looking at a recent objtool noinstr warning from clang
> > > builds:
> > >
> > > vmlinux.o: warning: objtool: sync_regs()+0x20: call to memcpy() leaves .noinstr.text section
> > >
> > > The issue is that clang generates a memcpy() call when a struct copy
> > > happens:
> > >
> > > if (regs != eregs)
> > > *regs = *eregs;
> >
> > Specifically, this is copying one struct pt_regs to another. It looks
> > like the sizeof struct pt_regs is just large enough to have clang emit
> > the libcall.
> > https://godbolt.org/z/scx6aa8jq
> > Otherwise clang will also use rep; movsq; when -mno-sse -O2 is set and
> > the structs are below ARBITRARY_THRESHOLD. Should ARBITRARY_THRESHOLD
> > be raised so that we continue to inline the memcpy? *shrug*
> >
> > Though, looking at the compiled memcpy (`llvm-objdump -D
> > --disassemble-symbols=memcpy vmlinux`), maybe we *should* try harder.
> > Filed
> > https://github.com/llvm/llvm-project/issues/54535.
> >
> > > see below for asm output.
> > >
> > > While gcc does simply generate an actual "rep; movsq".
> > >
> > > So, how hard would it be to make clang do that too pls?
> >
> > As Mark said in the sibling reply; I don't know of general ways to
> > inhibit libcall optimizations on the level you're looking for, short
> > of heavy handy methods of disabling optimizations entirely. There's
> > games that can be played with -fno-builtin-*, but they're not super
> > portable, and I think there's a handful of *blessed* functions that
> > must exist in any env, freestanding or not: memcpy, memmove, memset,
> > and memcmp for which you cannot yet express "these do not exist."
>
> Talking with Peter on IRC, I think there's an oversight on the compiler
> side here w.r.t. the expectations around these blessed functions, since
> either:
>
> a) The compiler expects the out-of-line implementations of functions
> ARE NOT instrumented by address-sanitizer.
>
> If this is the case, then it's legitimate for the compiler to call
> these functions anywhere, and we should NOT instrument the kernel
> implementations of these. If the compiler wants those instrumented it
> needs to add the instrumentation in the caller.
>
> b) The compiler expects the out-of-line implementations of functions
> ARE instrumented by address-sanitizer.
>
> If this is the case, the compiler MUST NOT generate implicit calls to
> these "blessed" functions from functions marked with:
>
> __attribute__((no_sanitize_address)).
>
> ... or the compiler is violating the premise of that attribute.
>
> AFAICT The two options for the compiler here are:
>
> 1) Always inline an uninstrumented form of the function in this case
>
> 2) Have distinct instrumented/uninstrumented out-of-line
> implementations, and call the uninstrumented form in this case.
>
> To see what clang and GCC do today, I hacked the following in:
>
> | diff --git a/init/main.c b/init/main.c
> | index 65fa2e41a9c0..30406c472b5d 100644
> | --- a/init/main.c
> | +++ b/init/main.c
> | @@ -1637,3 +1637,31 @@ static noinline void __init kernel_init_freeable(void)
> |
> | integrity_load_keys();
> | }
> | +
> | +void
> | +test_implicit_memcpy(struct task_struct *dest,
> | + const struct task_struct *src)
> | +{
> | + *dest = *src;
> | +}
> | +
> | +void
> | +test_explicit_memcpy(struct task_struct *dest,
> | + const struct task_struct *src)
> | +{
> | + memcpy(dest, src, sizeof(*dest));
> | +}
> | +
> | +void __no_sanitize_address
> | +test_implicit_memcpy_nokasan(struct task_struct *dest,
> | + const struct task_struct *src)
> | +{
> | + *dest = *src;
> | +}
> | +
> | +void __no_sanitize_address
> | +test_explicit_memcpy_nokasan(struct task_struct *dest,
> | + const struct task_struct *src)
> | +{
> | + memcpy(dest, src, sizeof(*dest));
> | +}
>
>
> For arm64, GCC 11.1.0, KASAN_OUTLINE I see:
>
> | <test_implicit_memcpy>:
> | d503245f bti c
> | d503233f paciasp
> | a9be7bfd stp x29, x30, [sp, #-32]!
> | 910003fd mov x29, sp
> | a90153f3 stp x19, x20, [sp, #16]
> | aa0103f3 mov x19, x1
> | aa0003f4 mov x20, x0
> | d281c001 mov x1, #0xe00 // #3584
> | 940b9534 bl ffff8000082f9d90 <__asan_storeN>
> | aa1303e0 mov x0, x19
> | d281c001 mov x1, #0xe00 // #3584
> | 940b951e bl ffff8000082f9d44 <__asan_loadN>
> | aa1303e1 mov x1, x19
> | aa1403e0 mov x0, x20
> | d281c002 mov x2, #0xe00 // #3584
> | 940b98c5 bl ffff8000082fabf0 <memcpy>
> | a94153f3 ldp x19, x20, [sp, #16]
> | a8c27bfd ldp x29, x30, [sp], #32
> | d50323bf autiasp
> | d65f03c0 ret
> |
> | <test_explicit_memcpy>:
> | d503245f bti c
> | d503233f paciasp
> | a9bf7bfd stp x29, x30, [sp, #-16]!
> | d281c002 mov x2, #0xe00 // #3584
> | 910003fd mov x29, sp
> | 940b98bb bl ffff8000082fabf0 <memcpy>
> | a8c17bfd ldp x29, x30, [sp], #16
> | d50323bf autiasp
> | d65f03c0 ret
> |
> | <test_implicit_memcpy_nokasan>:
> | d503245f bti c
> | d503233f paciasp
> | a9bf7bfd stp x29, x30, [sp, #-16]!
> | d281c002 mov x2, #0xe00 // #3584
> | 910003fd mov x29, sp
> | 940b98b2 bl ffff8000082fabf0 <memcpy>
> | a8c17bfd ldp x29, x30, [sp], #16
> | d50323bf autiasp
> | d65f03c0 ret
> |
> | <test_explicit_memcpy_nokasan>:
> | d503245f bti c
> | d503233f paciasp
> | a9bf7bfd stp x29, x30, [sp, #-16]!
> | d281c002 mov x2, #0xe00 // #3584
> | 910003fd mov x29, sp
> | 940b98a7 bl ffff8000082fabf0 <memcpy>
> | a8c17bfd ldp x29, x30, [sp], #16
> | d50323bf autiasp
> | d65f03c0 ret
>
> For x86_64, GCC 11.1.0, KASAN_OUTLINE I see:
>
> | <test_implicit_memcpy>:
> | 41 54 push %r12
> | 49 89 fc mov %rdi,%r12
> | 55 push %rbp
> | 48 89 f5 mov %rsi,%rbp
> | be 40 1c 00 00 mov $0x1c40,%esi
> | e8 0d 9a 32 00 call ffffffff8132b0f0 <__asan_storeN>
> | 48 89 ef mov %rbp,%rdi
> | be 40 1c 00 00 mov $0x1c40,%esi
> | e8 f0 99 32 00 call ffffffff8132b0e0 <__asan_loadN>
> | 4c 89 e7 mov %r12,%rdi
> | 48 89 ee mov %rbp,%rsi
> | b9 88 03 00 00 mov $0x388,%ecx
> | f3 48 a5 rep movsq %ds:(%rsi),%es:(%rdi)
> | 5d pop %rbp
> | 41 5c pop %r12
> | c3 ret
> |
> | <test_explicit_memcpy>:
> | ba 40 1c 00 00 mov $0x1c40,%edx
> | e9 b6 9f 32 00 jmp ffffffff8132b6d0 <memcpy>
> |
> |
> | <test_implicit_memcpy_nokasan>:
> | b9 88 03 00 00 mov $0x388,%ecx
> | f3 48 a5 rep movsq %ds:(%rsi),%es:(%rdi)
> | c3 ret
> |
> | <test_explicit_memcpy_nokasan>:
> | ba 40 1c 00 00 mov $0x1c40,%edx
> | e9 96 9f 32 00 jmp ffffffff8132b6d0 <memcpy>
>
> So from those examples it seems GCC falls into bucket (a), and assumes the
> blessed functions ARE NOT instrumented.
>
> We can make this noinstr-safe AND get instrumentation for the first two cases
> by removing the instrumentation from the out-of-line copies (always using
> noinstr asm implementations) and using ifdeffery to make the explicit calls
> target as distinct kasan_instrumented_memcpy() or similar...
>
>
> For arm64, clang 13.0.0, KASAN_OUTLINE I see:
>
> | <test_implicit_memcpy>:
> | d503233f paciasp
> | a9bf7bfd stp x29, x30, [sp, #-16]!
> | 910003fd mov x29, sp
> | 5281c002 mov w2, #0xe00 // #3584
> | 940c0f66 bl ffff8000083185fc <memcpy>
> | a8c17bfd ldp x29, x30, [sp], #16
> | d50323bf autiasp
> | d65f03c0 ret
> |
> | <test_explicit_memcpy>:
> | d503233f paciasp
> | a9bf7bfd stp x29, x30, [sp, #-16]!
> | 910003fd mov x29, sp
> | 5281c002 mov w2, #0xe00 // #3584
> | 940c0f5e bl ffff8000083185fc <memcpy>
> | a8c17bfd ldp x29, x30, [sp], #16
> | d50323bf autiasp
> | d65f03c0 ret
> |
> | <test_implicit_memcpy_nokasan>:
> | d503233f paciasp
> | a9bf7bfd stp x29, x30, [sp, #-16]!
> | 910003fd mov x29, sp
> | 5281c002 mov w2, #0xe00 // #3584
> | 940c0f56 bl ffff8000083185fc <memcpy>
> | a8c17bfd ldp x29, x30, [sp], #16
> | d50323bf autiasp
> | d65f03c0 ret
> |
> | <test_explicit_memcpy_nokasan>:
> | d503233f paciasp
> | a9bf7bfd stp x29, x30, [sp, #-16]!
> | 910003fd mov x29, sp
> | 5281c002 mov w2, #0xe00 // #3584
> | 940c0f4e bl ffff8000083185fc <memcpy>
> | a8c17bfd ldp x29, x30, [sp], #16
> | d50323bf autiasp
> | d65f03c0 ret
>
> For x86_64, clang 13.0.0, KASAN_OUTLINE I see:
>
> | <test_implicit_memcpy>:
> | ba 40 1c 00 00 mov $0x1c40,%edx
> | e8 d6 94 36 00 call ffffffff8136a830 <memcpy>
> | c3 ret
> |
> | <test_explicit_memcpy>:
> | ba 40 1c 00 00 mov $0x1c40,%edx
> | e8 c6 94 36 00 call ffffffff8136a830 <memcpy>
> | c3 ret
> |
> |
> | <test_implicit_memcpy_nokasan>:
> | ba 40 1c 00 00 mov $0x1c40,%edx
> | e9 b6 94 36 00 jmp ffffffff8136a830 <memcpy>
> |
> |
> | <test_explicit_memcpy_nokasan>:
> | ba 40 1c 00 00 mov $0x1c40,%edx
> | e9 a6 94 36 00 jmp ffffffff8136a830 <memcpy>
>
> ... for which the first two suggests clang thinks the blessed functions *are*
> instrumented, which means that generating calls to those in the latter two
> cases is a bug.
>
> We can make this noinstr-safe as with the GCC case, but we'll lose the
> desirable instrumentation for the test_implicit_memcpy() case.
>
> I think something has to change on the compiler side here (e.g. as per
> options above), and we should align GCC and clang on the same
> approach...
>
> Thanks,
> Mark.
Powered by blists - more mailing lists