[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrW5bz0jmx+NP_UJGdVmHdPS_-hzTecRrec-Ed+8RY=tgQ@mail.gmail.com>
Date: Tue, 29 Dec 2015 22:43:26 -0800
From: Andy Lutomirski <luto@...capital.net>
To: Dominique Martinet <dominique.martinet@....fr>,
Peter Zijlstra <peterz@...radead.org>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...nel.org>
Cc: Al Viro <viro@...iv.linux.org.uk>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
V9FS Developers <v9fs-developer@...ts.sourceforge.net>,
Linux FS Devel <linux-fsdevel@...r.kernel.org>
Subject: Re: [V9fs-developer] Hang triggered by udev coldplug, looks like a race
[add cc's]
Hi scheduler people:
This is relatively easy for me to reproduce. Any hints for debugging
it? Could we really have a bug in which processes that are
schedulable as a result of mutex unlock aren't always reliably
scheduled?
--Andy
On Thu, Dec 24, 2015 at 2:51 AM, Dominique Martinet
<dominique.martinet@....fr> wrote:
> Andy Lutomirski wrote on Thu, Dec 17, 2015:
>> This could be QEMU's analysis script screwing up. Is there a good way
>> for me to gather more info?
>
> I finally took some time to reproduce it (sorry for the delay)
>
> Using your config, virtme commit (17363c2) and kernel tag v4.4-rc3 I was
> able to reproduce it just fine with my qemu (2.4.90)
>
> Now for the fun bit... I ran it with a gdb server, attaching gdb and
> running cont always 'unblocks' it
> Using the kernel gdb scripts (lx-ps) I see about 250 kworker threads
> running, the backtraces all look the same:
>
> [ 20.273945] [<ffffffff818c3910>] schedule+0x30/0x80
> [ 20.274644] [<ffffffff818c3c39>] schedule_preempt_disabled+0x9/0x10
> [ 20.275539] [<ffffffff818c6147>] __mutex_lock_slowpath+0x107/0x2f0
> [ 20.276421] [<ffffffff811cf02e>] ? lookup_fast+0xbe/0x320
> [ 20.277195] [<ffffffff818c6345>] mutex_lock+0x15/0x30
> [ 20.277916] [<ffffffff811d0df7>] walk_component+0x1a7/0x270
>
>
> so given it unblocks after hooking gdb + cont I'm actually thinking this
> might be a pure scheduling issue? (e.g. thread is never re-scheduled or
> something like that?)
> I can't see any task not in schedule() in your sysrq dump task
> transcript either.
>
>
> Not sure how to go around debugging that, to be honest.
> I've tried both default one virtual cpu and -smp 3 or 4 and both can
> reproduce it; cpu usage on the host is always low so it doesn't look
> like there's any busy-polling involved.. This is a pretty subtle bug we
> have there..
>
> --
> Dominique Martinet
--
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists