linux-kernel - Re: [PATCH 3/4] vfs: count unlinked inodes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20111217073604.GA31872@ZenIV.linux.org.uk>
Date:	Sat, 17 Dec 2011 07:36:04 +0000
From:	Al Viro <viro@...IV.linux.org.uk>
To:	Miklos Szeredi <miklos@...redi.hu>
Cc:	hch@...radead.org, linux-fsdevel@...r.kernel.org,
	linux-kernel@...r.kernel.org, jack@...e.cz,
	akpm@...ux-foundation.org, toshi.okajima@...fujitsu.com,
	mszeredi@...e.cz
Subject: Re: [PATCH 3/4] vfs: count unlinked inodes

On Mon, Nov 21, 2011 at 12:11:32PM +0100, Miklos Szeredi wrote:
> @@ -241,6 +242,11 @@ void __destroy_inode(struct inode *inode)
>  	BUG_ON(inode_has_buffers(inode));
>  	security_inode_free(inode);
>  	fsnotify_inode_delete(inode);
> +	if (!inode->i_nlink) {
> +		WARN_ON(atomic_long_read(&inode->i_sb->s_remove_count) == 0);
> +		atomic_long_dec(&inode->i_sb->s_remove_count);
> +	}

Umm...  That relies on ->destroy_inode() doing nothing stupid; granted,
all work on actual file removal should've been done in ->evice_inode()
leaving only (RCU'd) freeing of in-core, but there are odd ones that
do strange things in ->destroy_inode() and I'm not sure that it's not
a Yet Another Remount Race(tm).  OTOH, it's clearly not worse than what
we used to have; just something to keep in mind for future work.

Anyway, I'm mostly OK with that series; I still hate your per-superblock
list of vfsmounts, but at least on top of the vfsmount-guts series they
won't be a temptation for abuse - list goes through struct mount now,
so filesystems won't be able to do fun things like "iterate through all
places where I'm mounted" (and #include "../mounts.h" in any fs code
will be a shootable offense - at least that is easy to spot).

There is another thing I'm less than happy about - suppose you have a
corrupted fs and run into zero on-disk i_nlink.  Sure, the inode will
get immediately evicted and __destroy_inode() will happen; however, for
the duration of that window you end up with bumped ->s_remove_count.
Transient EROFS is annoying, but tolerable - we only hit it if attempt
to remount r/o fails in ->remount_fs().  But this is something different -
it's a transient -EBUSY on attempt to remount r/o happening when nothing
actually is trying to do any kind of write access at all.  As it is,
you have ->s_remove_count equal to the number of in-core inodes with
zero ->i_nlink that had not yet reached destroy_inode().  Hell knows...
Maybe we want two versions of set_nlink(); one doing what yours does,
another returning -EINVAL if asked to set i_nlink to 0.  And assorted
foo_read_inode() would use the latter.  Anyway, that's a separate work;
so's the analysis of what happens if directory entry points to on-disk
inode with zero i_nlink.

Applied, with rebase on top of vfsmount-guts.  Will push the whole pile
into #for-next as soon as I finish sorting out conflicts in btrfs patches
versus btrfs tree.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/