linux-kernel - Re: sysfs: tagged directories not merged completely yet

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <48EB256C.4020003@fr.ibm.com>
Date:	Tue, 07 Oct 2008 11:01:32 +0200
From:	Daniel Lezcano <dlezcano@...ibm.com>
To:	Al Viro <viro@...IV.linux.org.uk>
CC:	"Eric W. Biederman" <ebiederm@...ssion.com>,
	Benjamin Thery <benjamin.thery@...l.net>,
	Greg KH <greg@...ah.com>, linux-kernel@...r.kernel.org,
	"Serge E. Hallyn" <serue@...ibm.com>,
	Al Viro <viro@....linux.org.uk>,
	Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: sysfs: tagged directories not merged completely yet

Al Viro wrote:
> On Tue, Sep 23, 2008 at 11:23:57AM -0700, Eric W. Biederman wrote:
>> Benjamin Thery <benjamin.thery@...l.net> writes:
>>> Oh.
>>> It's a pity Al couldn't re-review them before. We've already lost a lot
>>> of time with this patchset and it's blocking easier testing of network
>>> namespaces (right now, with a mainline kernel, we have to disable sysfs
>>> to build network namespaces).
>> I am confident that we have a good base with these patches and the rest of
>> the work can be done incrementally on top of them if any issues turn up.
>>
>> Al recent rework of sysctl has a very similar structure.
> 
> No, it does not.  My apologies for delay, but here are more printable parts
> of review.
> 
> First of all, this stuff breaks just about every damn integrity constraint VFS
> knows of.  It tries to tiptoe through the resulting minefield - without
> success.  So the first group of comments will be of "you *really* don't
> do $FOO" variety.  I'm very far from being convinced that we want to
> special-case in VFS every kind of weirdness sysfs happens to do; in effect,
> that would require adding a FS_IS_SYSFS_SO_BEND_OVER fs type flag and making
> a lot of locking conditional on that.
> 
> a) You do *not* share struct inode between the superblocks, for fsck sake!
> b) You do *not* grab i_mutex on ancestors after having grabbed it on
> file, as sysfs_chmod_file() does.
> c) You do *not* change dentry tree topology without s_vfs_rename_mutex on
> affected superblock.  That, BTW, is broken in mainline sysfs as well.
> d) You REALLY, REALLY do not unhash busy dentries of directories.
> 
> In addition to that, there are interesting internal problems:
> * inumbers are released by final sysfs_put(); that can happen before the
> final iput() on corresponding inode.  Guess what happens if new entry is
> created in the meanwhile, reuses the same inumber and lookup gets to
> sysfs_get_inode() on it?
> * may I politely suggest that
> again:
> 	mutex_lock(&A);
> 	if (!mutex_trylock(&B)) {
> 		mutex_unlock(&A);
> 		goto again;
> 	}
> is somewhat, er, deficient way to deal with buggered locking hierarchy?
> Not to mention anything else, that's obviously FUBAR on UP box - if we
> have B contended, we've just killed the box dead since we'll never give
> the CPU back to whoever happens to hold B.  See sysfs_mv_dir() for a lovely
> example.
> * sysfs_count_nlink() is called from sysfs_fill_super() without sysfs_mutex;
> now this sucker can get called at any moment.
> * just what is protecting ->s_tag?
> * __sysfs_remove_dir() appears to assume that subdirectories are possible;
> AFAICS, if we *do* get them, we get very screwed after remove_dir().
> * everything else aside, the internal locking is extremely heavy.  For
> fsck sake, guys, a single system-wide mutex that can be grabbed for the
> duration of readdir on any directory and block just about anything
> in the filesystem?  Just mmap() something over NFS on a slow link and
> do getdents() to such buffer.  Watch a *lot* of stuff getting buggered.
> Hell, you can't even do ifconfig up while that sucker is held...
> 
> Seriously, people, it's getting worse than devfs had ever been ;-/

Thank you Al for reviewing the patchset.

   -- Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/