[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160301180440.GA6769@krava.redhat.com>
Date: Tue, 1 Mar 2016 19:04:40 +0100
From: Jiri Olsa <jolsa@...hat.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Andi Kleen <andi@...stfloor.org>,
"Liang, Kan" <kan.liang@...el.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Ingo Molnar <mingo@...nel.org>,
Stephane Eranian <eranian@...gle.com>,
Wang Nan <wangnan0@...wei.com>,
"zheng.z.yan@...el.com" <zheng.z.yan@...el.com>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [BUG] Core2 cpu triggers hard lockup with perf test
On Tue, Mar 01, 2016 at 06:49:03PM +0100, Peter Zijlstra wrote:
> On Tue, Mar 01, 2016 at 06:17:22PM +0100, Jiri Olsa wrote:
>
> > [ 125.982977] [<ffffffff8100ae7b>] ? __intel_pmu_enable_all.isra.11+0x4b/0xd0^M
> > [ 125.982977] [<ffffffff8100ae7b>] ? __intel_pmu_enable_all.isra.11+0x4b/0xd0^M
> > [ 125.982977] [<ffffffff8100ae7b>] ? __intel_pmu_enable_all.isra.11+0x4b/0xd0^M
> > [ 125.982977] <<EOE>> [<ffffffff8100af10>] intel_pmu_enable_all+0x10/0x20^M
> > [ 125.982977] [<ffffffff81006283>] x86_pmu_enable+0x263/0x2f0^M
> > [ 125.982977] [<ffffffff811632d2>] perf_pmu_enable+0x22/0x30^M
> > [ 125.982977] [<ffffffff81163f51>] ctx_resched+0x51/0x60^M
> > [ 125.982977] [<ffffffff81164b09>] perf_event_exec+0x109/0x150^M
> > [ 125.982977] [<ffffffff811fff7d>] setup_new_exec+0x6d/0x1a0^M
> > [ 125.982977] [<ffffffff8125104a>] load_elf_binary+0x37a/0x10e0^M
> > [ 125.982977] [<ffffffff811a06c2>] ? get_user_pages+0x52/0x60^M
> > [ 125.982977] [<ffffffff811fe32e>] search_binary_handler+0x9e/0x1e0^M
> > [ 125.982977] [<ffffffff811ffccd>] do_execveat_common.isra.37+0x54d/0x6e0^M
> > [ 125.982977] [<ffffffff812000ea>] SyS_execve+0x3a/0x50^M
> > [ 125.982977] [<ffffffff81679065>] stub_execve+0x5/0x5^M
> > [ 125.982977] [<ffffffff81678dd7>] ? entry_SYSCALL_64_fastpath+0x12/0x6a^M
>
> > the exception addr is on wrmsr:
> >
> > ffffffff8100ae30 <__intel_pmu_enable_all.isra.11>:
> > ffffffff8100ae30: e8 bb 02 67 00 callq ffffffff8167b0f0 <__fentry__>
> > ffffffff8100ae35: 55 push %rbp
> > ffffffff8100ae36: 48 89 e5 mov %rsp,%rbp
> > ffffffff8100ae39: 41 54 push %r12
> > ffffffff8100ae3b: 41 89 fc mov %edi,%r12d
> > ffffffff8100ae3e: 53 push %rbx
> > ffffffff8100ae3f: 48 c7 c3 80 a3 00 00 mov $0xa380,%rbx
> > ffffffff8100ae46: 65 48 03 1d d2 f2 ff add %gs:0x7efff2d2(%rip),%rbx # a120 <this_cpu_off>
> > ffffffff8100ae4d: 7e
> > ffffffff8100ae4e: e8 6d 49 00 00 callq ffffffff8100f7c0 <intel_pmu_pebs_enable_all>
> > ffffffff8100ae53: 41 0f b6 fc movzbl %r12b,%edi
> > ffffffff8100ae57: e8 94 58 00 00 callq ffffffff810106f0 <intel_pmu_lbr_enable_all>
> > ffffffff8100ae5c: 48 8b 83 68 0c 00 00 mov 0xc68(%rbx),%rax
> > ffffffff8100ae63: b9 8f 03 00 00 mov $0x38f,%ecx
> > ffffffff8100ae68: 48 f7 d0 not %rax
> > ffffffff8100ae6b: 48 23 05 26 80 ad 00 and 0xad8026(%rip),%rax # ffffffff81ae2e98 <x86_pmu+0x138>
> > ffffffff8100ae72: 48 89 c2 mov %rax,%rdx
> > ffffffff8100ae75: 48 c1 ea 20 shr $0x20,%rdx
> > ffffffff8100ae79: 0f 30 wrmsr
> >
>
> That's the PERF_GLOBAL_CTRL, right? But it must have succeeded,
yep, should be this one:
static void __intel_pmu_enable_all(int added, bool pmi)
{
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
intel_pmu_pebs_enable_all();
intel_pmu_lbr_enable_all(pmi);
>>> wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL,
x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_guest_mask);
> otherwise the NMI watchdog would never have fired.
so NMI wouldn't trigger if CPU is inside wrmsr?
jirka
>
> Something is hosed alright.
>
> I think I've seen my IVB-EP do something similar. But mostly that
> machine gets stuck in intel_bts_enable_local().
>
>
Powered by blists - more mailing lists