[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <48EB256C.4020003@fr.ibm.com>
Date: Tue, 07 Oct 2008 11:01:32 +0200
From: Daniel Lezcano <dlezcano@...ibm.com>
To: Al Viro <viro@...IV.linux.org.uk>
CC: "Eric W. Biederman" <ebiederm@...ssion.com>,
Benjamin Thery <benjamin.thery@...l.net>,
Greg KH <greg@...ah.com>, linux-kernel@...r.kernel.org,
"Serge E. Hallyn" <serue@...ibm.com>,
Al Viro <viro@....linux.org.uk>,
Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: sysfs: tagged directories not merged completely yet
Al Viro wrote:
> On Tue, Sep 23, 2008 at 11:23:57AM -0700, Eric W. Biederman wrote:
>> Benjamin Thery <benjamin.thery@...l.net> writes:
>>> Oh.
>>> It's a pity Al couldn't re-review them before. We've already lost a lot
>>> of time with this patchset and it's blocking easier testing of network
>>> namespaces (right now, with a mainline kernel, we have to disable sysfs
>>> to build network namespaces).
>> I am confident that we have a good base with these patches and the rest of
>> the work can be done incrementally on top of them if any issues turn up.
>>
>> Al recent rework of sysctl has a very similar structure.
>
> No, it does not. My apologies for delay, but here are more printable parts
> of review.
>
> First of all, this stuff breaks just about every damn integrity constraint VFS
> knows of. It tries to tiptoe through the resulting minefield - without
> success. So the first group of comments will be of "you *really* don't
> do $FOO" variety. I'm very far from being convinced that we want to
> special-case in VFS every kind of weirdness sysfs happens to do; in effect,
> that would require adding a FS_IS_SYSFS_SO_BEND_OVER fs type flag and making
> a lot of locking conditional on that.
>
> a) You do *not* share struct inode between the superblocks, for fsck sake!
> b) You do *not* grab i_mutex on ancestors after having grabbed it on
> file, as sysfs_chmod_file() does.
> c) You do *not* change dentry tree topology without s_vfs_rename_mutex on
> affected superblock. That, BTW, is broken in mainline sysfs as well.
> d) You REALLY, REALLY do not unhash busy dentries of directories.
>
> In addition to that, there are interesting internal problems:
> * inumbers are released by final sysfs_put(); that can happen before the
> final iput() on corresponding inode. Guess what happens if new entry is
> created in the meanwhile, reuses the same inumber and lookup gets to
> sysfs_get_inode() on it?
> * may I politely suggest that
> again:
> mutex_lock(&A);
> if (!mutex_trylock(&B)) {
> mutex_unlock(&A);
> goto again;
> }
> is somewhat, er, deficient way to deal with buggered locking hierarchy?
> Not to mention anything else, that's obviously FUBAR on UP box - if we
> have B contended, we've just killed the box dead since we'll never give
> the CPU back to whoever happens to hold B. See sysfs_mv_dir() for a lovely
> example.
> * sysfs_count_nlink() is called from sysfs_fill_super() without sysfs_mutex;
> now this sucker can get called at any moment.
> * just what is protecting ->s_tag?
> * __sysfs_remove_dir() appears to assume that subdirectories are possible;
> AFAICS, if we *do* get them, we get very screwed after remove_dir().
> * everything else aside, the internal locking is extremely heavy. For
> fsck sake, guys, a single system-wide mutex that can be grabbed for the
> duration of readdir on any directory and block just about anything
> in the filesystem? Just mmap() something over NFS on a slow link and
> do getdents() to such buffer. Watch a *lot* of stuff getting buggered.
> Hell, you can't even do ifconfig up while that sucker is held...
>
> Seriously, people, it's getting worse than devfs had ever been ;-/
Thank you Al for reviewing the patchset.
-- Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists