lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130628011301.GC32195@dastard>
Date:	Fri, 28 Jun 2013 11:13:01 +1000
From:	Dave Chinner <david@...morbit.com>
To:	Dave Jones <davej@...hat.com>, Oleg Nesterov <oleg@...hat.com>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Linux Kernel <linux-kernel@...r.kernel.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	"Eric W. Biederman" <ebiederm@...ssion.com>,
	Andrey Vagin <avagin@...nvz.org>,
	Steven Rostedt <rostedt@...dmis.org>
Subject: Re: frequent softlockups with 3.10rc6.

On Thu, Jun 27, 2013 at 11:21:51AM -0400, Dave Jones wrote:
> On Thu, Jun 27, 2013 at 10:52:18PM +1000, Dave Chinner wrote:
>  
>  
>  > > Yup, that's about three of orders of magnitude faster on this
>  > > workload....
>  > > 
>  > > Lightly smoke tested patch below - it passed the first round of
>  > > XFS data integrity tests in xfstests, so it's not completely
>  > > busted...
>  > 
>  > And now with even less smoke out that the first version. This one
>  > gets though a full xfstests run...
> 
> :sadface:
> 
> [  567.680836] ======================================================
> [  567.681582] [ INFO: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected ]
> [  567.682389] 3.10.0-rc7+ #9 Not tainted
> [  567.682862] ------------------------------------------------------
> [  567.683607] trinity-child2/8665 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
> [  567.684464]  (&sb->s_type->i_lock_key#3){+.+...}, at: [<ffffffff811d74e5>] sync_inodes_sb+0x225/0x3b0
> [  567.685632] 
> and this task is already holding:
> [  567.686334]  (&(&wb->wb_list_lock)->rlock){..-...}, at: [<ffffffff811d7451>] sync_inodes_sb+0x191/0x3b0
> [  567.687506] which would create a new lock dependency:
> [  567.688115]  (&(&wb->wb_list_lock)->rlock){..-...} -> (&sb->s_type->i_lock_key#3){+.+...}

.....

> other info that might help us debug this:
> 
> [  567.750396]  Possible interrupt unsafe locking scenario:
> 
> [  567.752062]        CPU0                    CPU1
> [  567.753025]        ----                    ----
> [  567.753981]   lock(&sb->s_type->i_lock_key#3);
> [  567.754969]                                local_irq_disable();
> [  567.756085]                                lock(&(&wb->wb_list_lock)->rlock);
> [  567.757368]                                lock(&sb->s_type->i_lock_key#3);
> [  567.758642]   <Interrupt>
> [  567.759370]     lock(&(&wb->wb_list_lock)->rlock);

Oh, that's easy enough to fix. It's just changing the wait_sb_inodes
loop to use a spin_trylock(&inode->i_lock), moving the inode to
the end of the sync list, dropping all locks and starting again...

Cheers,

Dave.
-- 
Dave Chinner
david@...morbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ