[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20131205081201.GA17203@ZenIV.linux.org.uk>
Date: Thu, 5 Dec 2013 08:12:01 +0000
From: Al Viro <viro@...IV.linux.org.uk>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Simon Kirby <sim@...tway.ca>, Ingo Molnar <mingo@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
Waiman Long <Waiman.Long@...com>,
Ian Applegate <ia@...udflare.com>,
Christoph Lameter <cl@...two.org>,
Pekka Enberg <penberg@...nel.org>,
LKML <linux-kernel@...r.kernel.org>,
Chris Mason <chris.mason@...ionio.com>,
Steven Whitehouse <swhiteho@...hat.com>
Subject: gfs2 deadlock (was Re: Found it)
On Tue, Dec 03, 2013 at 04:28:30AM +0000, Al Viro wrote:
> These should be safe, but damnit, we really need the lifecycle documented for
> all objects - the above is only a part of it (note that for e.g. superblocks
> we have additional rules re "->s_active can't be incremented for any reason
> once it drops to zero, it can't be incremented until superblock had been
> marked 'born' and it crosses over to zero only with ->s_umount held"; there's
> 6 stages in life cycle of struct super_block and we had interesting bugs due
> to messing the transitions up). The trouble is, attempt to write those down
> tends to stray into massive grep session, with usual results - some other
> crap gets found (e.g. in some odd driver) and needs to be dealt with ;-/
> Sigh...
... and sure enough, this time is no different - gfs2 sysfs-related code
cheerfully violates lifetime rules for superblocks, which would've
caused a major mess later, if it had not immediately caused a deadlock
on the same superblock ;-/
Watch: gfs2 creates a bunch of files in sysfs (/sys/fs/gfs2/<devname>/*).
Said bunch gets removed from ->put_super(). Which is called under
->s_umount. Guess what happens if somebody tries to write "1" to
/sys/fs/gfs2/.../freeze just as we enter that ->put_super() (or at any
point starting from the moment when deactivate_locked_super() has dropped
the last active reference)? This:
static ssize_t freeze_store(struct gfs2_sbd *sdp, const char *buf, size_t len)
{
int error;
int n = simple_strtol(buf, NULL, 0);
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
switch (n) {
[snip]
case 1:
error = freeze_super(sdp->sd_vfs);
And freeze_super(sb) assumes that caller has an active reference to
sb:
int freeze_super(struct super_block *sb)
{
int ret;
atomic_inc(&sb->s_active);
... which is not legitimate when ->s_active has already reached zero.
And right after that we hit this:
down_write(&sb->s_umount);
Voila - write(2) is waiting for ->s_umount, while umount(2) is holding
->s_umount and waits for write(2) to get past freeze_store().
Hell knows what to do here - atomic_inc_not_zero() in freeze_super()
(and failing if it fails) would've worked, but it doesn't help with
the deadlock - just write "0" instead and we hit thaw_super(), which
starts with grabbing ->s_umount. atomic_inc_not_zero()/deactivate_super()
around that call of thaw_super() would probably work, but I'll need to
look at that after I get some sleep...
Why bother with sysfs, anyway? What's wrong with putting those same files
on gfs2meta, seeing that _this_ would have no problems with object lifetimes?
Too late by now, of course, but...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists