[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <A5ED84D3BB3A384992CBB9C77DEDA4D443E197DC@USINDEM103.corp.hds.com>
Date: Fri, 5 Jul 2013 18:20:15 +0000
From: Seiji Aguchi <seiji.aguchi@....com>
To: "H. Peter Anvin" <hpa@...or.com>,
Thomas Gleixner <tglx@...utronix.de>
CC: Dave Jones <davej@...hat.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Linux Kernel <linux-kernel@...r.kernel.org>,
Ingo Molnar <mingo@...nel.org>,
Peter Zijlstra <peterz@...radead.org>
Subject: RE: Yet more softlockups.
> -----Original Message-----
> From: H. Peter Anvin [mailto:hpa@...or.com]
> Sent: Friday, July 05, 2013 12:41 PM
> To: Thomas Gleixner
> Cc: Dave Jones; Linus Torvalds; Linux Kernel; Ingo Molnar; Peter Zijlstra; Seiji Aguchi
> Subject: Re: Yet more softlockups.
>
> On 07/05/2013 09:02 AM, Thomas Gleixner wrote:
> > On Fri, 5 Jul 2013, Dave Jones wrote:
> >> On Fri, Jul 05, 2013 at 05:15:07PM +0200, Thomas Gleixner wrote:
> >> > On Fri, 5 Jul 2013, Dave Jones wrote:
> >> >
> >> > > BUG: soft lockup - CPU#3 stuck for 23s! [trinity-child1:14565]
> >> > > perf samples too long (2519 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
> >> > > INFO: NMI handler (perf_event_nmi_handler) took too long to run: 238147.002 msecs
> >> >
> >> > So we see a softlockup of 23 seconds and the perf_event_nmi_handler
> >> > claims it did run 23.8 seconds.
> >> >
> >> > Are there more instances of NMI handler messages ?
> >>
> >> [ 2552.006181] perf samples too long (2511 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
> >> [ 2552.008680] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 500392.002 msecs
> >
> > Yuck. Spending 50 seconds in NMI context surely explains a softlockup :)
> >
>
> Hmmm... this makes me wonder if the interrupt tracepoint stuff is at
> fault here, as it changes the IDT handling for NMI context.
This softlockup happens while disabling the interrupt tracepoints,
Because if it is enabled, "smp_trace_apic_timer_interrupt" is displayed
instead of "smp_apic_timer_interrupt" in the call trace below.
But I can't say anything how this issue is related to the tracepoint stuff,
I need to reproduce it on my machine first.
Call Trace:
<IRQ>
[<ffffffff8105424f>] __do_softirq+0xff/0x440
[<ffffffff8105474d>] irq_exit+0xcd/0xe0
[<ffffffff816f5fcb>] smp_apic_timer_interrupt+0x6b/0x9b
[<ffffffff816f512f>] apic_timer_interrupt+0x6f/0x80
Seiji
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists