[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200302013602.GG10776@dread.disaster.area>
Date: Mon, 2 Mar 2020 12:36:02 +1100
From: Dave Chinner <david@...morbit.com>
To: ira.weiny@...el.com
Cc: linux-kernel@...r.kernel.org,
Alexander Viro <viro@...iv.linux.org.uk>,
"Darrick J. Wong" <darrick.wong@...cle.com>,
Dan Williams <dan.j.williams@...el.com>,
Christoph Hellwig <hch@....de>,
"Theodore Y. Ts'o" <tytso@....edu>, Jan Kara <jack@...e.cz>,
linux-ext4@...r.kernel.org, linux-xfs@...r.kernel.org,
linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH V5 06/12] fs: Add locking for a dynamic address space
operations state
On Mon, Mar 02, 2020 at 12:26:44PM +1100, Dave Chinner wrote:
> > /*
> > diff --git a/mm/filemap.c b/mm/filemap.c
> > index 1784478270e1..3a7863ba51b9 100644
> > --- a/mm/filemap.c
> > +++ b/mm/filemap.c
> > @@ -2293,6 +2293,8 @@ generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
> > * and return. Otherwise fallthrough to buffered io for
> > * the rest of the read. Buffered reads will not work for
> > * DAX files, so don't bother trying.
> > + *
> > + * IS_DAX is protected under ->read_iter lock
> > */
> > if (retval < 0 || !count || iocb->ki_pos >= size ||
> > IS_DAX(inode))
>
> This check is in the DIO path, be we can't do DIO on DAX enabled
> files to begin with, so we can only get here if S_DAX is not set on
> the file.
>
> Further, if IOCB_DIRECT is set, neither ext4 nor XFS call
> generic_file_read_iter(); they run the iomap_dio_rw() path directly
> instead. Only ext2 calls generic_file_read_iter() to do direct IO,
> so it's the only filesystem that needs this IS_DAX() check in it.
>
> I think we should fix ext2 to be implemented like ext4 and XFS -
> they implement the buffered IO fallback, should it be required,
> themselves and never use generic_file_read_iter() for direct IO.
>
> That would allow us to add this to generic_file_read_iter():
>
> if (WARN_ON_ONCE(IS_DAX(inode))
> return -EINVAL;
>
> to indicate that this should never be called directly on a DAX
> capable filesystem. This places all the responsibility for managing
> DAX behaviour on the filesystem, which then allows us to reason more
> solidly about how the filesystem IO paths use and check the S_DAX
> flag.
>
> i.e. changing the on-disk flag already locks out the filesystem IO
> path via the i_rwsem(), and all the filesystem IO paths (buffered,
> direct IO and dax) are serialised by this flag. Hence we can check
> once in the filesystem path once we have the i_rwsem held and
> know that S_DAX will not change until we release it.
>
> ..... and now I realise what I was sitting on the fence about....
>
> I don't like the aops locking in call_read/write_iter() because it
> is actually redundant: the filesystem should be doing the necessary
> locking in the IO path via the i_rwsem to prevent S_DAX from
> changing while it is doing the IO.
>
> IOWs, we need to restructure the locking inside the filesystem
> read_iter and write_iter methods so that the i_rwsem protects the
> S_DAX flag from changing dynamically. They all do:
>
> if (dax)
> do_dax_io()
> if (direct)
> do_direct_io()
> do_buffered_io()
>
> And then we take the i_rwsem inside each of those functions and do
> the IO. What we actually need to do is something like this:
>
> inode_lock_shared()
> if (dax)
> do_dax_io()
> if (direct)
> do_direct_io()
> do_buffered_io()
> inode_unlock_shared()
>
> And remove the inode locking from inside the individual IO methods
> themselves. It's a bit more complex than this because buffered
> writes require exclusive locking, but this completely removes the
> need for holding an aops lock over these methods.
>
> I've attached a couple of untested patches (I've compiled them, so
> they must be good!) to demonstrate what I mean for the XFS IO path.
> The read side removes a heap of duplicate code, but the write side
> is .... unfortunately complex. Have to think about that more.
And now with patches...
Cheers,
Dave.
--
Dave Chinner
david@...morbit.com
View attachment "xfs-io-s_dax-locking" of type "text/plain" (3658 bytes)
View attachment "xfs-io-s_dax-write-locking" of type "text/plain" (10444 bytes)
Powered by blists - more mailing lists