lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20101022022010.GG19804@ZenIV.linux.org.uk>
Date:	Fri, 22 Oct 2010 03:20:10 +0100
From:	Al Viro <viro@...IV.linux.org.uk>
To:	Nick Piggin <npiggin@...nel.dk>
Cc:	Dave Chinner <david@...morbit.com>, linux-fsdevel@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: Inode Lock Scalability V7 (was V6)

On Fri, Oct 22, 2010 at 11:45:40AM +1100, Nick Piggin wrote:

> No you didn't make these points to me over the past couple of weeks.
> Specifically, do you agree or disagree about these points:
> - introducing new concurrency situations from not having a single lock
>   for an inode's icache state is a negative?

I disagree.

> And I have kept saying I would welcome your idea to reduce i_lock width
> in a small incremental patch. I still haven't figured out quite what
> is so important that can't be achieved in simpler ways (like rcu, or
> using a seperate inode lock).

No, it's not a small incremental.  It's your locking order being wrong;
the natural one is
	[hash, wb, sb] > ->i_lock > [lru]
and that's one hell of a difference compared to what you are doing.

Look:
	* iput_final() should happen under ->i_lock
	* if it leaves the inode alive, that's it; we can put it on LRU list
since lru lock nests inside ->i_lock
	* if it decides to kill the inode, it sets I_FREEING or I_WILL_FREE
before dropping ->i_lock.  Once that's done, the inode is ours and nobody
will pick it through the lists.  We can release ->i_lock and then do what's
needed.  Safely.
	* accesses of ->i_state are under ->i_lock, including the switchover
from I_WILL_FREE to I_FREEING
	* walkers of the sb, wb and hash lists can grab ->i_lock at will;
it nests inside their locks.
	* prune_icache() grabs lru lock, then trylocks ->i_lock on the
first element.  If trylock fails, we just give inode another spin through
the list by moving it to the tail; if it doesn't, we are holding ->i_lock
and can proceed safely.

What you seem to miss is that there are very few places accessing inode through
the lists (i.e. via pointers that do not contribute to refcount) and absolute
majority already checks for I_FREEING/I_WILL_FREE, refusing to pick such
inodes.  It's not an accidental subtle property of the code, it's bloody
fundamental.

As I've said, I've no religious problems with trylocks; we *do* need them for
prune_icache() to get a sane locking scheme.  But the way you put ->i_lock on
the top of hierarchy is simply wrong.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ