lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 26 Jun 2019 11:24:32 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     Nathan Chancellor <natechancellor@...il.com>,
        Kees Cook <keescook@...omium.org>,
        Miguel Ojeda <miguel.ojeda.sandonis@...il.com>,
        "Gustavo A. R. Silva" <gustavo@...eddedor.com>,
        Joe Perches <joe@...ches.com>, Ingo Molnar <mingo@...hat.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Jiri Olsa <jolsa@...hat.com>,
        Namhyung Kim <namhyung@...nel.org>,
        Borislav Petkov <bp@...en8.de>,
        "H. Peter Anvin" <hpa@...or.com>,
        "maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)" <x86@...nel.org>,
        Kan Liang <kan.liang@...ux.intel.com>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        Shawn Landden <shawn@....icu>,
        clang-built-linux@...glegroups.com,
        Josh Poimboeuf <jpoimboe@...hat.com>
Subject: Re: [PATCH] perf/x86/intel: Mark expected switch fall-throughs

On Tue, Jun 25, 2019 at 11:47:06PM +0200, Thomas Gleixner wrote:
> > On Tue, Jun 25, 2019 at 09:53:09PM +0200, Thomas Gleixner wrote:

> > > but it also makes objtool unhappy:
> > > 
> > >  arch/x86/events/intel/core.o: warning: objtool: intel_pmu_nhm_workaround()+0xb3: unreachable instruction
> > >  kernel/fork.o: warning: objtool: free_thread_stack()+0x126: unreachable instruction
> > >  mm/workingset.o: warning: objtool: count_shadow_nodes()+0x11f: unreachable instruction
> > >  arch/x86/kernel/cpu/mtrr/generic.o: warning: objtool: get_fixed_ranges()+0x9b: unreachable instruction
> > >  arch/x86/kernel/platform-quirks.o: warning: objtool: x86_early_init_platform_quirks()+0x84: unreachable instruction
> > >  drivers/iommu/irq_remapping.o: warning: objtool: irq_remap_enable_fault_handling()+0x1d: unreachable instruction

> I just checked two of them in the disassembly. In both cases it's jump
> label related. Here is one:
> 
>       asm volatile("1: rdmsr\n"
>  410:   b9 59 02 00 00          mov    $0x259,%ecx
>  415:   0f 32                   rdmsr
>  417:   49 89 c6                mov    %rax,%r14
>  41a:   48 89 d3                mov    %rdx,%rbx
>       return EAX_EDX_VAL(val, low, high);
>  41d:   48 c1 e3 20             shl    $0x20,%rbx
>  421:   48 09 c3                or     %rax,%rbx
>  424:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
>  429:   eb 0f                   jmp    43a <get_fixed_ranges+0xaa>
>       do_trace_read_msr(msr, val, 0);
>  42b:   bf 59 02 00 00          mov    $0x259,%edi   <------- "unreachable"
>  430:   48 89 de                mov    %rbx,%rsi
>  433:   31 d2                   xor    %edx,%edx
>  435:   e8 00 00 00 00          callq  43a <get_fixed_ranges+0xaa>
>  43a:   44 89 35 00 00 00 00    mov    %r14d,0x0(%rip)        # 441 <get_fixed_ranges+0xb1>
> 
> Interestingly enough there are some more hunks of the same pattern in that
> function which look all the same. Those are not upsetting objtool. Josh
> might give an hint where to stare at.

That's pretty atrocious code-gen :/ Does LLVM support things like label
attributes? Back when we did jump labels GCC didn't, or rather, it
ignored it completely when combined with asm goto (and it might still).

That is, would something like this:

diff --git a/arch/x86/include/asm/jump_label.h b/arch/x86/include/asm/jump_label.h
index 06c3cc22a058..1761b1e76ddc 100644
--- a/arch/x86/include/asm/jump_label.h
+++ b/arch/x86/include/asm/jump_label.h
@@ -32,7 +32,7 @@ static __always_inline bool arch_static_branch(struct static_key *key, bool bran
 		: :  "i" (key), "i" (branch) : : l_yes);
 
 	return false;
-l_yes:
+l_yes: __attribute__((cold));
 	return true;
 }
 
@@ -49,7 +49,7 @@ static __always_inline bool arch_static_branch_jump(struct static_key *key, bool
 		: :  "i" (key), "i" (branch) : : l_yes);
 
 	return false;
-l_yes:
+l_yes: __attribute__((hot));
 	return true;
 }
 
Help LLVM?

Still, objtool should be able to deal with that code.

> Just for the fun of it I looked at the GCC output of the same file. It
> takes a different apporach:
> 
>       asm volatile("1: rdmsr\n"
>  c70:   b9 59 02 00 00          mov    $0x259,%ecx
>  c75:   0f 32                   rdmsr
>       return EAX_EDX_VAL(val, low, high);
>  c77:   48 c1 e2 20             shl    $0x20,%rdx
>  c7b:   48 89 d3                mov    %rdx,%rbx
>  c7e:   48 09 c3                or     %rax,%rbx
>  c81:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
>  c86:   48 89 1d 00 00 00 00    mov    %rbx,0x0(%rip)        # c8d <get_fixed_ranges.constprop.5+0x7d>
> 
> and the tracing code is completely out of line:
> 
>       do_trace_read_msr(msr, val, 0);
>  ce2:   31 d2                   xor    %edx,%edx
>  ce4:   48 89 de                mov    %rbx,%rsi
>  ce7:   bf 59 02 00 00          mov    $0x259,%edi
>  cec:   e8 00 00 00 00          callq  cf1 <get_fixed_ranges.constprop.5+0xe1>
>  cf1:   eb 93                   jmp    c86 <get_fixed_ranges.constprop.5+0x76>
> 
> which makes a lot of sense as the normal path (tracepoint disabled) just
> runs through linearly while in the clang version it has to jump around the
> tracepoint code.
> 
> The jump itself is not a problem, but what matters is the $I cache
> footprint. The GCC version hotpath fits in 3 cache lines while the Clang
> version unconditionally eats 4.2 of them. That's a huge difference.

Yeah, this is the right and expected code-gen.

Powered by blists - more mailing lists