lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.21.1906252255440.32342@nanos.tec.linutronix.de>
Date:   Tue, 25 Jun 2019 23:47:06 +0200 (CEST)
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Nathan Chancellor <natechancellor@...il.com>
cc:     Kees Cook <keescook@...omium.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Miguel Ojeda <miguel.ojeda.sandonis@...il.com>,
        "Gustavo A. R. Silva" <gustavo@...eddedor.com>,
        Joe Perches <joe@...ches.com>, Ingo Molnar <mingo@...hat.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Jiri Olsa <jolsa@...hat.com>,
        Namhyung Kim <namhyung@...nel.org>,
        Borislav Petkov <bp@...en8.de>,
        "H. Peter Anvin" <hpa@...or.com>,
        "maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)" <x86@...nel.org>,
        Kan Liang <kan.liang@...ux.intel.com>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        Shawn Landden <shawn@....icu>,
        clang-built-linux@...glegroups.com,
        Josh Poimboeuf <jpoimboe@...hat.com>
Subject: Re: [PATCH] perf/x86/intel: Mark expected switch fall-throughs

Nathan,

On Tue, 25 Jun 2019, Nathan Chancellor wrote:
> On Tue, Jun 25, 2019 at 09:53:09PM +0200, Thomas Gleixner wrote:
> > 
> > But can the script please check for a minimal clang version required to
> > build that thing.
> > 
> > The default clang-3.8 which is installed on Debian stretch explodes. The
> > 6.0 variant from backports works as advertised.
> > 
> 
> Hmmm interesting, I test a lot of different distros using Docker
> containers to make sure the script works universally and that includes
> Debian stretch, which is the stress tester because all of the packages
> are older. I install the following packages then run the following
> command and it works fine for me (just tested):
> 
> $ apt update && apt install -y --no-install-recommends ca-certificates \
> ccache clang cmake curl file gcc g++ git make ninja-build python3 \
> texinfo zlib1g-dev
> $ ./build-llvm.py
> 
> If you could give me a build log, I'd be happy to look into it and see
> what I can do.

I can produce one tomorrow.
 
> > Kernel builds with the new shiny compiler. Jump labels seem to be enabled.
> > 
> > It complains about a few type conversions:
> > 
> >  arch/x86/kvm/mmu.c:4596:39: warning: implicit conversion from 'int' to 'u8' (aka 'unsigned char') changes value from -205 to 51 [-Wconstant-conversion]
> >                 u8 wf = (pfec & PFERR_WRITE_MASK) ? ~w : 0;
> >                    ~~                               ^~
> > 
> 
> Yes, there was a patch sent to try and fix this but it was rejected by
> the maintainers:
> 
> https://github.com/ClangBuiltLinux/linux/issues/95
> 
> https://lore.kernel.org/lkml/20180619192504.180479-1-mka@chromium.org/

Just looked through it. I don't think it's an outright reject. Paolo was
not totally against it and then the whole discussion degraded into bikeshed
painting and bitching about compiler error messaged. Try again or should I?

> > but it also makes objtool unhappy:
> > 
> >  arch/x86/events/intel/core.o: warning: objtool: intel_pmu_nhm_workaround()+0xb3: unreachable
instruction
> >  kernel/fork.o: warning: objtool: free_thread_stack()+0x126: unreachable instruction
> >  mm/workingset.o: warning: objtool: count_shadow_nodes()+0x11f: unreachable instruction
> >  arch/x86/kernel/cpu/mtrr/generic.o: warning: objtool: get_fixed_ranges()+0x9b: unreachable
instruction
> >  arch/x86/kernel/platform-quirks.o: warning: objtool: x86_early_init_platform_quirks()+0x84:
unreachable instruction
> >  drivers/iommu/irq_remapping.o: warning: objtool: irq_remap_enable_fault_handling()+0x1d:
unreachable instruction

> Unfortunately, we have quite a few of those outstanding, it's probably
> time to start really taking a look at them:
> 
> https://github.com/ClangBuiltLinux/linux/labels/objtool

I just checked two of them in the disassembly. In both cases it's jump
label related. Here is one:

      asm volatile("1: rdmsr\n"
 410:   b9 59 02 00 00          mov    $0x259,%ecx
 415:   0f 32                   rdmsr
 417:   49 89 c6                mov    %rax,%r14
 41a:   48 89 d3                mov    %rdx,%rbx
      return EAX_EDX_VAL(val, low, high);
 41d:   48 c1 e3 20             shl    $0x20,%rbx
 421:   48 09 c3                or     %rax,%rbx
 424:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
 429:   eb 0f                   jmp    43a <get_fixed_ranges+0xaa>
      do_trace_read_msr(msr, val, 0);
 42b:   bf 59 02 00 00          mov    $0x259,%edi   <------- "unreachable"
 430:   48 89 de                mov    %rbx,%rsi
 433:   31 d2                   xor    %edx,%edx
 435:   e8 00 00 00 00          callq  43a <get_fixed_ranges+0xaa>
 43a:   44 89 35 00 00 00 00    mov    %r14d,0x0(%rip)        # 441 <get_fixed_ranges+0xb1>

Interestingly enough there are some more hunks of the same pattern in that
function which look all the same. Those are not upsetting objtool. Josh
might give an hint where to stare at.

Just for the fun of it I looked at the GCC output of the same file. It
takes a different apporach:

      asm volatile("1: rdmsr\n"
 c70:   b9 59 02 00 00          mov    $0x259,%ecx
 c75:   0f 32                   rdmsr
      return EAX_EDX_VAL(val, low, high);
 c77:   48 c1 e2 20             shl    $0x20,%rdx
 c7b:   48 89 d3                mov    %rdx,%rbx
 c7e:   48 09 c3                or     %rax,%rbx
 c81:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
 c86:   48 89 1d 00 00 00 00    mov    %rbx,0x0(%rip)        # c8d <get_fixed_ranges.constprop.5+0x7d>

and the tracing code is completely out of line:

      do_trace_read_msr(msr, val, 0);
 ce2:   31 d2                   xor    %edx,%edx
 ce4:   48 89 de                mov    %rbx,%rsi
 ce7:   bf 59 02 00 00          mov    $0x259,%edi
 cec:   e8 00 00 00 00          callq  cf1 <get_fixed_ranges.constprop.5+0xe1>
 cf1:   eb 93                   jmp    c86 <get_fixed_ranges.constprop.5+0x76>

which makes a lot of sense as the normal path (tracepoint disabled) just
runs through linearly while in the clang version it has to jump around the
tracepoint code.

The jump itself is not a problem, but what matters is the $I cache
footprint. The GCC version hotpath fits in 3 cache lines while the Clang
version unconditionally eats 4.2 of them. That's a huge difference.

> Thanks for trying it out and letting us know. Please keep us in the loop
> if you happen to find anything amiss.

Will do.

Thanks,

	tglx

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ