[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20111219020340.GG2203@ZenIV.linux.org.uk>
Date: Mon, 19 Dec 2011 02:03:40 +0000
From: Al Viro <viro@...IV.linux.org.uk>
To: Dave Chinner <david@...morbit.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Djalal Harouni <tixxdz@...ndz.org>,
Hugh Dickins <hughd@...gle.com>,
Minchan Kim <minchan.kim@...il.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
Wu Fengguang <fengguang.wu@...el.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org,
"J. Bruce Fields" <bfields@...ldses.org>,
Neil Brown <neilb@...e.de>,
Mikulas Patocka <mikulas@...ax.karlin.mff.cuni.cz>,
Christoph Hellwig <hch@...radead.org>
Subject: Re: [PATCH] mm: add missing mutex lock arround notify_change
On Mon, Dec 19, 2011 at 12:43:43PM +1100, Dave Chinner wrote:
> > We have a shitload of deadlocks on very common paths with that patch. What
> > of the paths that do lead to file_remove_suid() without i_mutex?
> > * xfs_file_aio_write_checks(): we drop i_mutex (via xfs_rw_iunlock())
> > just before calling file_remove_suid(). Racy, the fix is obvious - move
> > file_remove_suid() call before unlocking.
>
> Not exactly. xfs_rw_iunlock() is not doing what you think it's doing
> there.....
Huh? It is called as
> > - xfs_rw_iunlock(ip, XFS_ILOCK_EXCL);
and thus in
static inline void
xfs_rw_iunlock(
struct xfs_inode *ip,
int type)
{
xfs_iunlock(ip, type);
if (type & XFS_IOLOCK_EXCL)
mutex_unlock(&VFS_I(ip)->i_mutex);
}
we are guaranteed to hit i_mutex.
> Wrong lock. That's dropping the internal XFS inode metadata lock,
> but the VFS i_mutex is associated with the internal XFS inode IO
> lock, which is accessed via XFS_IOLOCK_*. Only if we take the iolock
> via XFS_IOLOCK_EXCL do we actually take the i_mutex.
> Now it gets complex. For buffered IO, we are guaranteed to already
> be holding the i_mutex because we do:
>
> *iolock = XFS_IOLOCK_EXCL;
> xfs_rw_ilock(ip, *iolock);
>
> ret = xfs_file_aio_write_checks(file, &pos, &count, new_size, iolock);
>
> So that is safe and non-racy right now.
No, it is not - we *drop* it before calling file_remove_suid(). Explicitly.
Again, look at that xfs_rw_iunlock() call there - it does drop i_mutex
(which is to say, you'd better have taken it prior to that, or you have
far worse problems).
> For direct IO, however, we don't always take the IOLOCK exclusively.
> Indeed, we try really, really hard not to do this so we can do
> concurrent reads and writes to the inode, and that results
> in a bunch of lock juggling when we actually need the IOLOCK
> exclusive (like in xfs_file_aio_write_checks()). It sounds like we
> need to know if we are going to have to remove the SUID bit ahead of
> time so that we can take the correct lock up front. I haven't
> looked at what is needed to do that yet.
OK, I'm definitely missing something. The very first thing
xfs_file_aio_write_checks() does is
xfs_rw_ilock(ip, XFS_ILOCK_EXCL);
which really makes me wonder how the hell does that manage to avoid an
instant deadlock in case of call via xfs_file_buffered_aio_write()
where we have:
struct address_space *mapping = file->f_mapping;
struct inode *inode = mapping->host;
struct xfs_inode *ip = XFS_I(inode);
*iolock = XFS_IOLOCK_EXCL;
xfs_rw_ilock(ip, *iolock);
ret = xfs_file_aio_write_checks(file, &pos, &count, new_size, iolock);
which leads to
struct inode *inode = file->f_mapping->host;
struct xfs_inode *ip = XFS_I(inode);
(IOW, inode and ip are the same as in the caller) followed by
xfs_rw_ilock(ip, XFS_ILOCK_EXCL);
and with both xfs_rw_ilock() calls turning into
mutex_lock(&VFS_I(ip)->i_mutex);
xfs_ilock(ip, XFS_ILOCK_EXCL);
we ought to deadlock on that i_mutex. What am I missing and how do we manage
to survive that?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists