lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100703034123.GE11732@laptop>
Date:	Sat, 3 Jul 2010 13:41:23 +1000
From:	Nick Piggin <npiggin@...e.de>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	Dave Chinner <david@...morbit.com>, linux-fsdevel@...r.kernel.org,
	linux-kernel@...r.kernel.org, John Stultz <johnstul@...ibm.com>,
	Frank Mayhar <fmayhar@...gle.com>
Subject: Re: [patch 29/52] fs: icache lock i_count

On Fri, Jul 02, 2010 at 07:03:55PM -0700, Andrew Morton wrote:
> On Wed, 30 Jun 2010 22:05:02 +1000 Nick Piggin <npiggin@...e.de> wrote:
> 
> > On Wed, Jun 30, 2010 at 05:27:02PM +1000, Dave Chinner wrote:
> > > On Thu, Jun 24, 2010 at 01:02:41PM +1000, npiggin@...e.de wrote:
> > > > Protect inode->i_count with i_lock, rather than having it atomic.
> > > > Next step should also be to move things together (eg. the refcount increment
> > > > into d_instantiate, which will remove a lock/unlock cycle on i_lock).
> > > .....
> > > > Index: linux-2.6/fs/inode.c
> > > > ===================================================================
> > > > --- linux-2.6.orig/fs/inode.c
> > > > +++ linux-2.6/fs/inode.c
> > > > @@ -33,14 +33,13 @@
> > > >   * inode_hash_lock protects:
> > > >   *   inode hash table, i_hash
> > > >   * inode->i_lock protects:
> > > > - *   i_state
> > > > + *   i_state, i_count
> > > >   *
> > > >   * Ordering:
> > > >   * inode_lock
> > > >   *   sb_inode_list_lock
> > > >   *     inode->i_lock
> > > > - * inode_lock
> > > > - *   inode_hash_lock
> > > > + *       inode_hash_lock
> > > >   */
> > > 
> > > I thought that the rule governing the use of inode->i_lock was that
> > > it can be used anywhere as long as it is the innermost lock.
> > > 
> > > Hmmm, no references in the code or documentation. Google gives a
> > > pretty good reference:
> > > 
> > > http://www.mail-archive.com/linux-ext4@vger.kernel.org/msg02584.html
> > > 
> > > Perhaps a different/new lock needs to be used here?
> > 
> > Well I just changed the order (and documented it to boot :)). It's
> > pretty easy to verify that LOR is no problem. inode hash is only
> > taken in a very few places so other code outside inode.c is fine to
> > use i_lock as an innermost lock.
> 
> um, nesting a kernel-wide singleton lock inside a per-inode lock is
> plain nutty.

I think it worked out OK. Because the kernel-wide locks are really
restricted in where they are to be used (ie. not in filesystems). So
they're really pretty constrained to the inode management subsystem.
So filesystems still get to really use i_lock as an inner most lock
for their purposes.

And filesystems get to take i_lock and stop all manipulation of inode
now.  No changing of flags, moving it on/off lists etc behind its back.
It really is about locking the data rather than the code.

The final ordering outcome looks like this:

 * inode->i_lock
 *   inode_list_lglock
 *   zone->inode_lru_lock
 *   wb->b_lock
 *   inode_hash_bucket lock

And it works like that because when you want to add or remove an inode
from various data structures, you take the i_lock and then take each
of these locks in turn, inside it. The alternative is to build a bigger
lock ordering graph, and take all the locks up-front before taking
i_lock. I did actaully try that and it ended up being worse, so I went
this route.

I think taking a global lock in mark_inode_dirty is nutty (especially
when that global lock is shared with hash management, LRU scanning,
writeback, i_flags access... :) It's just a question of which is less
nutty.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ