[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200225211228.GB15810@iweiny-DESK2.sc.intel.com>
Date: Tue, 25 Feb 2020 13:12:28 -0800
From: Ira Weiny <ira.weiny@...el.com>
To: Dave Chinner <david@...morbit.com>
Cc: linux-kernel@...r.kernel.org,
Alexander Viro <viro@...iv.linux.org.uk>,
"Darrick J. Wong" <darrick.wong@...cle.com>,
Dan Williams <dan.j.williams@...el.com>,
Christoph Hellwig <hch@....de>,
"Theodore Y. Ts'o" <tytso@....edu>, Jan Kara <jack@...e.cz>,
linux-ext4@...r.kernel.org, linux-xfs@...r.kernel.org,
linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH V4 09/13] fs/xfs: Add write aops lock to xfs layer
On Tue, Feb 25, 2020 at 09:32:45AM +1100, Dave Chinner wrote:
> On Mon, Feb 24, 2020 at 11:57:36AM -0800, Ira Weiny wrote:
> > On Mon, Feb 24, 2020 at 11:34:55AM +1100, Dave Chinner wrote:
> > > On Thu, Feb 20, 2020 at 04:41:30PM -0800, ira.weiny@...el.com wrote:
> > > > From: Ira Weiny <ira.weiny@...el.com>
> > > >
> >
> > [snip]
> >
> > > >
> > > > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > > > index 35df324875db..5b014c428f0f 100644
> > > > --- a/fs/xfs/xfs_inode.c
> > > > +++ b/fs/xfs/xfs_inode.c
> > > > @@ -142,12 +142,12 @@ xfs_ilock_attr_map_shared(
> > > > *
> > > > * Basic locking order:
> > > > *
> > > > - * i_rwsem -> i_mmap_lock -> page_lock -> i_ilock
> > > > + * s_dax_sem -> i_rwsem -> i_mmap_lock -> page_lock -> i_ilock
> > > > *
> > > > * mmap_sem locking order:
> > > > *
> > > > * i_rwsem -> page lock -> mmap_sem
> > > > - * mmap_sem -> i_mmap_lock -> page_lock
> > > > + * s_dax_sem -> mmap_sem -> i_mmap_lock -> page_lock
> > > > *
> > > > * The difference in mmap_sem locking order mean that we cannot hold the
> > > > * i_mmap_lock over syscall based read(2)/write(2) based IO. These IO paths can
> > > > @@ -182,6 +182,9 @@ xfs_ilock(
> > > > (XFS_ILOCK_SHARED | XFS_ILOCK_EXCL));
> > > > ASSERT((lock_flags & ~(XFS_LOCK_MASK | XFS_LOCK_SUBCLASS_MASK)) == 0);
> > > >
> > > > + if (lock_flags & XFS_DAX_EXCL)
> > > > + inode_aops_down_write(VFS_I(ip));
> > >
> > > I largely don't see the point of adding this to xfs_ilock/iunlock.
> > >
> > > It's only got one caller, so I don't see much point in adding it to
> > > an interface that has over a hundred other call sites that don't
> > > need or use this lock. just open code it where it is needed in the
> > > ioctl code.
> >
> > I know it seems overkill but if we don't do this we need to code a flag to be
> > returned from xfs_ioctl_setattr_dax_invalidate(). This flag is then used in
> > xfs_ioctl_setattr_get_trans() to create the transaction log item which can then
> > be properly used to unlock the lock in xfs_inode_item_release()
> >
> > I don't know of a cleaner way to communicate to xfs_inode_item_release() to
> > unlock i_aops_sem after the transaction is complete.
>
> We manually unlock inodes after transactions in many cases -
> anywhere we do a rolling transaction, the inode locks do not get
> released by the transaction. Hence for a one-off case like this it
> doesn't really make sense to push all this infrastructure into the
> transaction subsystem. Especially as we can manually lock before and
> unlock after the transaction context without any real complexity.
So does xfs_trans_commit() operate synchronously?
I want to understand this better because I have fought with a lot of ABBA
issues with these locks. So... can I hold the lock until after
xfs_trans_commit() and safely unlock it there... because the XFS_MMAPLOCK_EXCL,
XFS_IOLOCK_EXCL, and XFS_ILOCK_EXCL will be released at that point? Thus
preserving the following lock order.
...
* Basic locking order:
*
* i_aops_sem -> i_rwsem -> i_mmap_lock -> page_lock -> i_ilock
*
...
Thanks for the review!
Ira
>
> This also means that we can, if necessary, do aops manipulation work
> /after/ the transaction that changes on-disk state completes and we
> still hold the aops reference exclusively. While we don't do that
> now, I think it is worthwhile keeping our options open here....
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@...morbit.com
Powered by blists - more mailing lists