linux-kernel - Re: [RFC] st_nlink after rmdir() and rename()

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20110303225702.GQ22723@ZenIV.linux.org.uk>
Date:	Thu, 3 Mar 2011 22:57:02 +0000
From:	Al Viro <viro@...IV.linux.org.uk>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	OGAWA Hirofumi <hirofumi@...l.parknet.co.jp>,
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC] st_nlink after rmdir() and rename()

On Thu, Mar 03, 2011 at 01:52:18PM -0800, Linus Torvalds wrote:
> On Thu, Mar 3, 2011 at 1:37 PM, OGAWA Hirofumi
> <hirofumi@...l.parknet.co.jp> wrote:
> >
> > And I can't only see is why you refuse to make consistent behavior (if
> > you are saying it). It's why I said if it's _really easy_.
> 
> The thing is, it really isn't really easy. As mentioned, it's actually
> impossible on NFS, and it's possibly impossible on other filesystems
> too.
> 
> So what I'm objecting to is "try to make something consistent that
> CANNOT be consistent anyway", and calling it a bug.
> 
> I'm not saying there aren't real bugs there too (the actual races in
> i_nlink handling are real bugs). But I _am_ saying that it's simply
> not true that i_nlink must be zero if you do an "fstat()" after doing
> an rmdir on an fd that you held open. Nobody can reasonably care, and
> anybody who _does_ care is better off getting a nasty surprise early
> rather than late.

Ho-hum...  OK, let me put it that way:
	* pile I've sent a pull request for is really bug-only; none of it
has anything to do with what's discussed in that thread, other than "it's
also about i_nlink and found during the same code review".  i_nlink races,
pair of fs corruptors and a braino in UDF (256 << sizeof(inode->i_nlink)
as a way to spell "maximal allowed number of links"; never really worked,
even before we had switched to 32bit internal i_nlink - the real limit is
0xffff, not 0x3ff or 0xfff).
	* it's trivial to get the same behaviour on all local filesystems;
most of them have it and rely on it to detect the inodes that need to be
freed on final iput().  It has nothing to do with counting subdirs or any
such nonsense.
	* inotify is broken for filesystems that don't get you zero ->i_nlink
when the last dentry pointing to doomed inode is dropped.  Regardless of what
you get in fstat().  Excusable for remote fs, but not nice for local ones.
I'd *LOVE* to get rid of inotife/dnotify/etc., but it's probably not feasible
now.
	* NFS is not hard to handle, actually, especially for directories.
Regular files may be trickier, but then we have many places in that area
where NFS is not quite POSIX-compliant, to put it mildly.
	* I honestly don't know what's the real situation with other
remote filesystems; thus the RFC.  Hopefully, people familiar with that
are on fsdevel...

BTW, I suspect that another exception among the local filesystems (affs)
is actually leaking blocks on rmdir.  Need to experiment to verify that,
but it smells like another genuine bug.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/