[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130712154521.GD1020@redhat.com>
Date: Fri, 12 Jul 2013 11:45:21 -0400
From: Dave Jones <davej@...hat.com>
To: Dave Hansen <dave.hansen@...el.com>
Cc: Ingo Molnar <mingo@...nel.org>,
Markus Trippelsdorf <markus@...ppelsdorf.de>,
Thomas Gleixner <tglx@...utronix.de>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Linux Kernel <linux-kernel@...r.kernel.org>,
Peter Anvin <hpa@...or.com>,
Peter Zijlstra <peterz@...radead.org>,
Dave Hansen <dave.hansen@...ux.intel.com>
Subject: Re: Yet more softlockups.
On Fri, Jul 12, 2013 at 08:38:52AM -0700, Dave Hansen wrote:
> The warning comes from calling perf_sample_event_took(), which is only
> called from one place: perf_event_nmi_handler().
>
> So we can be pretty sure that the perf NMI is firing, or at least that
> this handler code is running.
>
> nmi_handle() says:
> /*
> * NMIs are edge-triggered, which means if you have enough
> * of them concurrently, you can lose some because only one
> * can be latched at any given time. Walk the whole list
> * to handle those situations.
> */
>
> perf_event_nmi_handler() probably gets _called_ when the watchdog NMI
> goes off. But, it should hit this check:
>
> if (!atomic_read(&active_events))
> return NMI_DONE;
>
> and return quickly. This is before it has a chance to call
> perf_sample_event_took().
>
> Dave, for your case, my suspicion would be that it got turned on
> inadvertently, or that we somehow have a bug which bumped up
> perf_event.c's 'active_events' and we're running some perf code that we
> don't have to.
What do you 'inadvertantly' ? I see this during bootup every time.
Unless systemd or something has started playing with perf, (which afaik it isn't)
> But, I'm suspicious. I was having all kinds of issues with perf and
> NMIs taking hundreds of milliseconds. I never isolated it to having a
> real, single, cause. I attributed it to my large NUMA system just being
> slow. Your description makes me wonder what I missed, though.
Here's a fun trick:
trinity -c perf_event_open -C4 -q -l off
Within about a minute, that brings any of my boxes to its knees.
The softlockup detector starts going nuts, and then the box wedges solid.
(You may need to bump -C depending on your CPU count. I've never seen it happen
with a single process, but -C2 seems to be a minimum)
That *is* using perf though, so I kind of expect bad shit to happen when there are bugs.
The "during bootup" case is still a head-scratcher.
Dave
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists