[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y/mEsfyhNCs8orCY@magnolia>
Date: Fri, 24 Feb 2023 19:46:57 -0800
From: "Darrick J. Wong" <djwong@...nel.org>
To: Dave Chinner <david@...morbit.com>
Cc: Jan Kara <jack@...e.cz>, Al Viro <viro@...iv.linux.org.uk>,
linux-fsdevel@...r.kernel.org, linux-ext4@...r.kernel.org,
Ted Tso <tytso@....edu>, linux-xfs@...r.kernel.org
Subject: Re: Locking issue with directory renames
On Wed, Jan 18, 2023 at 08:44:57AM +1100, Dave Chinner wrote:
> On Tue, Jan 17, 2023 at 01:37:35PM +0100, Jan Kara wrote:
> > Hello!
> >
> > I've some across an interesting issue that was spotted by syzbot [1]. The
> > report is against UDF but AFAICS the problem exists for ext4 as well and
> > possibly other filesystems. The problem is the following: When we are
> > renaming directory 'dir' say rename("foo/dir", "bar/") we lock 'foo' and
> > 'bar' but 'dir' is unlocked because the locking done by vfs_rename() is
> >
> > if (!is_dir || (flags & RENAME_EXCHANGE))
> > lock_two_nondirectories(source, target);
> > else if (target)
> > inode_lock(target);
> >
> > However some filesystems (e.g. UDF but ext4 as well, I suspect XFS may be
> > hurt by this as well because it converts among multiple dir formats) need
> > to update parent pointer in 'dir' and nothing protects this update against
> > a race with someone else modifying 'dir'. Now this is mostly harmless
> > because the parent pointer (".." directory entry) is at the beginning of
> > the directory and stable however if for example the directory is converted
> > from packed "in-inode" format to "expanded" format as a result of
> > concurrent operation on 'dir', the filesystem gets corrupted (or crashes as
> > in case of UDF).
>
> No, xfs_rename() does not have this problem - we pass four inodes to
> the function - the source directory, source inode, destination
> directory and destination inode.
Um, I think it does have this problem. xfs_readdir thinks it can parse
a shortform inode without taking the ILOCK:
if (dp->i_df.if_format == XFS_DINODE_FMT_LOCAL)
return xfs_dir2_sf_getdents(&args, ctx);
lock_mode = xfs_ilock_data_map_shared(dp);
error = xfs_dir2_isblock(&args, &isblock);
So xfs_dir2_sf_replace can rewrite the shortform structure (or even
convert it to block format!) while readdir is accessing it. Or am I
mising something?
--D
> In the above case, "dir/" is passed to XFs as the source inode - the
> src_dir is "foo/", the target dir is "bar/" and the target inode is
> null. src_dir != target_dir, so we set the "new_parent" flag. the
> srouce inode is a directory, so we set the src_is_directory flag,
> too.
>
> We lock all three inodes that are passed. We do various things, then
> run:
>
> if (new_parent && src_is_directory) {
> /*
> * Rewrite the ".." entry to point to the new
> * directory.
> */
> error = xfs_dir_replace(tp, src_ip, &xfs_name_dotdot,
> target_dp->i_ino, spaceres);
> ASSERT(error != -EEXIST);
> if (error)
> goto out_trans_cancel;
> }
>
> which replaces the ".." entry in source inode atomically whilst it
> is locked. Any directory format changes that occur during the
> rename are done while the ILOCK is held, so they appear atomic to
> outside observers that are trying to parse the directory structure
> (e.g. readdir).
>
> > So we'd need to lock 'source' if it is a directory.
>
> Yup, and XFS goes further by always locking the source inode in a
> rename, even if it is not a directory. This ensures the inode being
> moved cannot have it's metadata otherwise modified whilst the rename
> is in progress, even if that modification would have no impact on
> the rename. It's a pretty strict interpretation of "rename is an
> atomic operation", but it avoids accidentally missing nasty corner
> cases like the one described above...
>
> > Ideally this would
> > happen in VFS as otherwise I bet a lot of filesystems will get this wrong
> > so could vfs_rename() lock 'source' if it is a dir as well? Essentially
> > this would amount to calling lock_two_nondirectories(source, target)
> > unconditionally but that would become a serious misnomer ;). Al, any
> > thought?
>
> XFS just has a function that allows for an arbitrary number of
> inodes to be locked in the given order: xfs_lock_inodes(). For
> rename, the lock order is determined by xfs_sort_for_rename().
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@...morbit.com
Powered by blists - more mailing lists