lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 14 Mar 2013 14:06:01 -0700
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Ingo Molnar <mingo@...nel.org>,
	Stephane Eranian <eranian@...gle.com>
Cc:	Arnaldo Carvalho de Melo <acme@...radead.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Thomas Gleixner <tglx@...utronix.de>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [GIT PULL] perf fixes

On Thu, Mar 14, 2013 at 1:32 PM, Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> And to make things interesting, I seem to be able to only reproduce
> this *after* a suspend cycle. That may be just happenstance, since it
> seemed to be hard to replicate and most of the time it has happened
> under X with no messages visible at all, but that *seems* to be the
> pattern.
>
> And the one time I got it to happen on the text console, things
> scrolled off (watchdog warnings due to lockups), but I did get a NULL
> pointer dereference in intel_pmu_enable_all().
>
> I'll try to reproduce it and get a picture,

Theory more or less confirmed.

It does need a suspend/resume cycle, and I have a picture. The oops
happens immediately when trying to do any perf work after the first
suspend, before suspending I seem to be able to reliably use perf. It
could still be just random flakiness, but I don't think so.

The NULL pointer dereference is at intel_pmu_enable_all+0x4d/0xa0 for
me, which seems to be the load of the

    if (test_bit(INTEL_PMC_IDX_FIXED_BTS, cpuc->active_mask))

thing. It says

   BUG: unable to handle NULL pointer dereference at 0000000000000028

But that error makes no sense. The code at that EIP is

  48 8b 83 00 02 00 00 mov    0x200(%rbx),%rax     <-- trapping instruction

and the value printed out for %rbx is 0xffff80014f20b8e0, so it should
*not* be a NULL pointer dereference (and "cpuc" was also used just
before the wrmsrl).

So I suspect that the "wrmsrl" that was just before that instruction
does something odd, and the PMU is in some odd state, so that the NULL
pointer dereference actually has something to do with *that*, rather
than the instruction itself.

The callchain looks normal. It's

  finish_task_switch ->
    __perf_event_task_sched_in ->
      perf_event_context_sched_in ->
        perf_pmu_enable ->
          x86_pmu_enable ->
            intel_pmu_enable_all()

The immediately preceding wrmsrl was done with rax=0xf, rdx=0x7,
rcx=0x38f according to the register dump (but the picture isn't great,
so the numbers aren't 100% reliable).

Does this give any clues?

             Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ