[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.11.1411142326510.3909@nanos>
Date: Fri, 14 Nov 2014 23:55:30 +0100 (CET)
From: Thomas Gleixner <tglx@...utronix.de>
To: Linus Torvalds <torvalds@...ux-foundation.org>
cc: Dave Jones <davej@...hat.com>,
Linux Kernel <linux-kernel@...r.kernel.org>,
the arch/x86 maintainers <x86@...nel.org>
Subject: Re: frequent lockups in 3.18rc4
On Fri, 14 Nov 2014, Linus Torvalds wrote:
> On Fri, Nov 14, 2014 at 1:31 PM, Dave Jones <davej@...hat.com> wrote:
> > I'm not sure how long this goes back (3.17 was fine afair) but I'm
> > seeing these several times a day lately..
>
> Plus, judging by the fact that there's a stale "leave_mm+0x210/0x210"
> (wouldn't that be the *next* function, namely do_flush_tlb_all())
> pointer on the stack, I suspect that whole range-flushing doesn't even
> trigger, and we are flushing everything.
This stale entry is not relevant here because the thing is stuck in
generic_exec_single().
> > NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [trinity-c129:25570]
> > RIP: 0010:[<ffffffff9c11e98a>] [<ffffffff9c11e98a>] generic_exec_single+0xea/0x1d0
> > Call Trace:
> > [<ffffffff9c048b20>] ? leave_mm+0x210/0x210
> > [<ffffffff9c048b20>] ? leave_mm+0x210/0x210
> > [<ffffffff9c11ead6>] smp_call_function_single+0x66/0x110
> > [<ffffffff9c048b20>] ? leave_mm+0x210/0x210
> > [<ffffffff9c11f021>] smp_call_function_many+0x2f1/0x390
> > [<ffffffff9c049300>] flush_tlb_mm_range+0xe0/0x370
flush_tlb_mm_range()
.....
out:
if (cpumask_any_but(mm_cpumask(mm), smp_processor_id()) < nr_cpu_ids)
flush_tlb_others(mm_cpumask(mm), mm, start, end);
which calls
smp_call_function_many() via native_flush_tlb_others()
which is either inlined or not on the stack the invocation of
smp_call_function_many() is a tail call.
So from smp_call_function_many() we end up via
smp_call_function_single() in generic_exec_single().
So the only ways to get stuck there are:
csd_lock(csd);
and
csd_lock_wait(csd);
The called function is flush_tlb_func() and I really can't see why
that would get stuck at all.
So this looks more like a smp function call fuckup.
I assume Dave is running that stuff on KVM. So it might be worth while
to look at the IPI magic there.
Thanks,
tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists