linux-kernel - Re: [PATCH] mm: add missing mutex lock arround notify

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20111219020340.GG2203@ZenIV.linux.org.uk>
Date:	Mon, 19 Dec 2011 02:03:40 +0000
From:	Al Viro <viro@...IV.linux.org.uk>
To:	Dave Chinner <david@...morbit.com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Djalal Harouni <tixxdz@...ndz.org>,
	Hugh Dickins <hughd@...gle.com>,
	Minchan Kim <minchan.kim@...il.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	Wu Fengguang <fengguang.wu@...el.com>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org,
	"J. Bruce Fields" <bfields@...ldses.org>,
	Neil Brown <neilb@...e.de>,
	Mikulas Patocka <mikulas@...ax.karlin.mff.cuni.cz>,
	Christoph Hellwig <hch@...radead.org>
Subject: Re: [PATCH] mm: add missing mutex lock arround notify_change

On Mon, Dec 19, 2011 at 12:43:43PM +1100, Dave Chinner wrote:
> > We have a shitload of deadlocks on very common paths with that patch.  What
> > of the paths that do lead to file_remove_suid() without i_mutex?
> > *	xfs_file_aio_write_checks(): we drop i_mutex (via xfs_rw_iunlock())
> > just before calling file_remove_suid().  Racy, the fix is obvious - move
> > file_remove_suid() call before unlocking.
> 
> Not exactly. xfs_rw_iunlock() is not doing what you think it's doing
> there.....

Huh?  It is called as 

> > -	xfs_rw_iunlock(ip, XFS_ILOCK_EXCL);

and thus in
static inline void
xfs_rw_iunlock(
        struct xfs_inode        *ip,
        int                     type)
{
        xfs_iunlock(ip, type);
        if (type & XFS_IOLOCK_EXCL)
                mutex_unlock(&VFS_I(ip)->i_mutex);
}
we are guaranteed to hit i_mutex.  

> Wrong lock.  That's dropping the internal XFS inode metadata lock,
> but the VFS i_mutex is associated with the internal XFS inode IO
> lock, which is accessed via XFS_IOLOCK_*. Only if we take the iolock
> via XFS_IOLOCK_EXCL do we actually take the i_mutex.

> Now it gets complex. For buffered IO, we are guaranteed to already
> be holding the i_mutex because we do:
> 
>         *iolock = XFS_IOLOCK_EXCL;
>         xfs_rw_ilock(ip, *iolock);
> 
>         ret = xfs_file_aio_write_checks(file, &pos, &count, new_size, iolock);
> 
> So that is safe and non-racy right now.

No, it is not - we *drop* it before calling file_remove_suid().  Explicitly.
Again, look at that xfs_rw_iunlock() call there - it does drop i_mutex
(which is to say, you'd better have taken it prior to that, or you have
far worse problems).

> For direct IO, however, we don't always take the IOLOCK exclusively.
> Indeed, we try really, really hard not to do this so we can do
> concurrent reads and writes to the inode, and that results
> in a bunch of lock juggling when we actually need the IOLOCK
> exclusive (like in xfs_file_aio_write_checks()). It sounds like we
> need to know if we are going to have to remove the SUID bit ahead of
> time so that we can  take the correct lock up front. I haven't
> looked at what is needed to do that yet.

OK, I'm definitely missing something.  The very first thing
xfs_file_aio_write_checks() does is
        xfs_rw_ilock(ip, XFS_ILOCK_EXCL);
which really makes me wonder how the hell does that manage to avoid an
instant deadlock in case of call via xfs_file_buffered_aio_write()
where we have:
        struct address_space    *mapping = file->f_mapping;
        struct inode            *inode = mapping->host;
        struct xfs_inode        *ip = XFS_I(inode);
        *iolock = XFS_IOLOCK_EXCL;
        xfs_rw_ilock(ip, *iolock);
        ret = xfs_file_aio_write_checks(file, &pos, &count, new_size, iolock);
which leads to
        struct inode            *inode = file->f_mapping->host;
        struct xfs_inode        *ip = XFS_I(inode);
(IOW, inode and ip are the same as in the caller) followed by
        xfs_rw_ilock(ip, XFS_ILOCK_EXCL);
and with both xfs_rw_ilock() calls turning into
	mutex_lock(&VFS_I(ip)->i_mutex);
        xfs_ilock(ip, XFS_ILOCK_EXCL);
we ought to deadlock on that i_mutex.  What am I missing and how do we manage
to survive that?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/