linux-kernel - Re: [PATCH 16/21] fs: Protect inode->i_state with the inode->i

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20101024191735.GU19804@ZenIV.linux.org.uk>
Date:	Sun, 24 Oct 2010 20:17:35 +0100
From:	Al Viro <viro@...IV.linux.org.uk>
To:	Christoph Hellwig <hch@...radead.org>
Cc:	Dave Chinner <david@...morbit.com>, linux-fsdevel@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 16/21] fs: Protect inode->i_state with the inode->i_lock

On Sun, Oct 24, 2010 at 12:21:31PM -0400, Christoph Hellwig wrote:
> On Sun, Oct 24, 2010 at 10:13:10AM -0400, Christoph Hellwig wrote:
> > On Sat, Oct 23, 2010 at 10:37:52PM +0100, Al Viro wrote:
> > > 	* invalidate_inodes() - collect I_FREEING/I_WILL_FREE on a separate
> > > list, then (after we'd evicted the stuff we'd decided to evict) wait until
> > > they get freed by whatever's freeing them already.
> > 
> > Note that we would only have to do this for the umount case.  For others
> > it's pretty pointless.
> 
> Now that I've looked into it I think we basically fine right now.
> 
> If we're in umount there should be no other I_FREEING inodes.
> 
>  - concurrent prune_icache is prevented by iprune_sem.
>  - concurrent other invalidate_inodes should not happen because we're
>    in unmount and the filesystem should not be reachable any more,
>    and even if it did iprune_sem would protect us.
>  - how could a concurrent iput_final happen?  filesystem is not
>    accessible anymore, and iput of fs internal inodes is single-threaded
>    with the rest of the actual umount process.
> 
> So just skipping over I_FREEING inodes here should be fine for
> non-umount callers, and for umount we could even do a WARN_ON.

FWIW, I think we should kill most of invalidate_inodes() callers.  Look:
	* call in generic_shutdown_super() is legitimate.  The first one,
that is.  The second should be replaced with check for ->s_list being
non-empty.  Note that after the first pass we should have kicked out
everything with zero i_count.  Everything that gets dropped to zero
i_count after that (i.e. during ->put_super()) will be evicted immediately
and won't stay.  I.e. the second call will evict *nothing*; it's just
an overblown way to check if there are any inodes left.
	* call in ext2_remount() is hogwash - we do that with at least
root inode pinned down, so it will fail, along with the remount attempt.
	* ntfs_fill_super() call - no-op.  MS_ACTIVE hasn't been set
yet, so there will be no inodes with zero i_count sitting around.
	* gfs2 calls - same story (no MS_ACTIVE yet in fill_super(),
MS_ACTIVE already removed *and* invalidate_inodes() already called
in gfs2_put_super())
	* smb reconnect logics.  AFAICS, that's complete crap; we *never*
retain inodes on smbfs.  IOW, nothing for invalidate_inodes() to do, other
than evict fsnotify marks.  Which is to say, we are calling the wrong
function there, even assuming that fsnotify should try to work there.
	* finally, __invalidate_device().  Which has a slew of callers of
its own and is *very* different from normal situation.  Here we have
underlying device gone bad.

So I'm going to do the following:
	1) split evict_inodes() off invalidate_inodes() and simplify it.
	2) switch generic_shutdown_super() to that sucker, called once.
	3) kill all calls of invalidate_inodes() except __invalidate_device()
one.
	4) think hard about __invalidate_device() situation.

	evict_inodes() should *not* see any inodes with
I_NEW/I_FREEING/I_WILL_FREE.  Just skip.  It might see I_DIRTY/I_SYNC,
but that's OK - evict_inode() will wait for that.

	OTOH, invalidate_inodes() from __invalidate_device() can run in
parallel with e.g. final iput().  Currently it's not a problem, but
we'll need to start skipping I_FREEING/I_WILL_FREE ones there if we want
to change iput() locking.

	And yes, iprune_sem is a trouble waiting to happen - one fs stuck
in e.g. truncate_inode_pages() and we are seriously fucked; any non-lazy
umount() will get stuck as well.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/