linux-kernel - Re: [V9fs-developer] Hang triggered by udev coldplug, looks like a race

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Tue, 29 Dec 2015 22:43:26 -0800
From:	Andy Lutomirski <luto@...capital.net>
To:	Dominique Martinet <dominique.martinet@....fr>,
	Peter Zijlstra <peterz@...radead.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...nel.org>
Cc:	Al Viro <viro@...iv.linux.org.uk>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	V9FS Developers <v9fs-developer@...ts.sourceforge.net>,
	Linux FS Devel <linux-fsdevel@...r.kernel.org>
Subject: Re: [V9fs-developer] Hang triggered by udev coldplug, looks like a race

[add cc's]

Hi scheduler people:

This is relatively easy for me to reproduce.  Any hints for debugging
it?  Could we really have a bug in which processes that are
schedulable as a result of mutex unlock aren't always reliably
scheduled?

--Andy

On Thu, Dec 24, 2015 at 2:51 AM, Dominique Martinet
<dominique.martinet@....fr> wrote:
> Andy Lutomirski wrote on Thu, Dec 17, 2015:
>> This could be QEMU's analysis script screwing up.  Is there a good way
>> for me to gather more info?
>
> I finally took some time to reproduce it (sorry for the delay)
>
> Using your config, virtme commit (17363c2) and kernel tag v4.4-rc3 I was
> able to reproduce it just fine with my qemu (2.4.90)
>
> Now for the fun bit... I ran it with a gdb server, attaching gdb and
> running cont always 'unblocks' it
> Using the kernel gdb scripts (lx-ps) I see about 250 kworker threads
> running, the backtraces all look the same:
>
> [   20.273945]  [<ffffffff818c3910>] schedule+0x30/0x80
> [   20.274644]  [<ffffffff818c3c39>] schedule_preempt_disabled+0x9/0x10
> [   20.275539]  [<ffffffff818c6147>] __mutex_lock_slowpath+0x107/0x2f0
> [   20.276421]  [<ffffffff811cf02e>] ? lookup_fast+0xbe/0x320
> [   20.277195]  [<ffffffff818c6345>] mutex_lock+0x15/0x30
> [   20.277916]  [<ffffffff811d0df7>] walk_component+0x1a7/0x270
>
>
> so given it unblocks after hooking gdb + cont I'm actually thinking this
> might be a pure scheduling issue? (e.g. thread is never re-scheduled or
> something like that?)
> I can't see any task not in schedule() in your sysrq dump task
> transcript either.
>
>
> Not sure how to go around debugging that, to be honest.
> I've tried both default one virtual cpu and -smp 3 or 4 and both can
> reproduce it; cpu usage on the host is always low so it doesn't look
> like there's any busy-polling involved.. This is a pretty subtle bug we
> have there..
>
> --
> Dominique Martinet



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/