lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 27 Jun 2013 19:59:50 -1000
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Dave Chinner <david@...morbit.com>,
	Al Viro <viro@...iv.linux.org.uk>, Jan Kara <jack@...e.cz>
Cc:	Dave Jones <davej@...hat.com>, Oleg Nesterov <oleg@...hat.com>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Linux Kernel <linux-kernel@...r.kernel.org>,
	"Eric W. Biederman" <ebiederm@...ssion.com>,
	Andrey Vagin <avagin@...nvz.org>,
	Steven Rostedt <rostedt@...dmis.org>
Subject: Re: frequent softlockups with 3.10rc6.

On Thu, Jun 27, 2013 at 5:54 PM, Dave Chinner <david@...morbit.com> wrote:
> On Thu, Jun 27, 2013 at 04:54:53PM -1000, Linus Torvalds wrote:
>>
>> So what made it all start happening now? I don't recall us having had
>> these kinds of issues before..
>
> Not sure - it's a sudden surprise for me, too. Then again, I haven't
> been looking at sync from a performance or lock contention point of
> view any time recently.  The algorithm that wait_sb_inodes() is
> effectively unchanged since at least 2009, so it's probably a case
> of it having been protected from contention by some external factor
> we've fixed/removed recently.  Perhaps the bdi-flusher thread
> replacement in -rc1 has changed the timing sufficiently that it no
> longer serialises concurrent sync calls as much....
>
> However, the inode_sb_list_lock is known to be a badly contended
> lock from a create/unlink fastpath for XFS, so it's not like this sort
> of thing is completely unexpected.

That whole inode_sb_list_lock seems moronic. Why isn't it a per-sb
one? No, that won't fix all problems, but it might at least help a
*bit*.

Also, looking some more now at that wait_sb_inodes logic, I have to
say that if the problem is primarily the inode->i_lock, then that's
just crazy. We normally shouldn't even *need* that lock, since we
could do a totally unlocked iget() as long as the count is non-zero.

And no, I don't think really need the i_lock for checking
"mapping->nrpages == 0" or the magical "inode is being freed" bits
either. Or at least we could easily do some of this optimistically for
the common cases.

So instead of doing

                struct address_space *mapping = inode->i_mapping;

                spin_lock(&inode->i_lock);
                if ((inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) ||
                    (mapping->nrpages == 0)) {
                        spin_unlock(&inode->i_lock);
                        continue;
                }
                __iget(inode);
                spin_unlock(&inode->i_lock);

I really think we could do that without getting the inode lock at
*all* in the common case.

I'm attaching a pretty trivial patch, which may obviously be trivially
totally flawed. I have not tested this in any way, but half the new
lines are comments about why it's doing what it is doing.  And I
really think that it should make the "actually take the inode lock" be
something quite rare.

And quite frankly, I'd much rather get *rid* of crazy i_lock accesses,
than try to be clever and use a whole different list at this point.
Not that I disagree that it wouldn't be much nicer to use a separate
list in the long run, but for a short-term solution I'd much rather
keep the old logic and just tweak it to be much more usable..

Hmm? Al? Jan? Comments?

                             Linus

Download attachment "patch.diff" of type "application/octet-stream" (2358 bytes)

Powered by blists - more mailing lists