[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190910112014.GB10579@bobrowski>
Date: Tue, 10 Sep 2019 21:20:14 +1000
From: Matthew Bobrowski <mbobrowski@...browski.org>
To: Ritesh Harjani <riteshh@...ux.ibm.com>
Cc: tytso@....edu, jack@...e.cz, adilger.kernel@...ger.ca,
hch@...radead.org, darrick.wong@...cle.com,
linux-ext4@...r.kernel.org, linux-fsdevel@...r.kernel.org,
david@...morbit.com
Subject: Re: [PATCH v2 5/6] ext4: introduce direct IO write path using iomap
infrastructure
On Mon, Sep 09, 2019 at 08:02:13PM +0530, Ritesh Harjani wrote:
> On 9/9/19 2:56 PM, Ritesh Harjani wrote:
> > > + ret = iomap_dio_rw(iocb, from, &ext4_iomap_ops,
> > > ext4_dio_write_end_io);
> > > +
> > > + /*
> > > + * Unaligned direct AIO must be the only IO in flight or else
> > > + * any overlapping aligned IO after unaligned IO might result
> > > + * in data corruption. We also need to wait here in the case
> > > + * where the inode is being extended so that inode extension
> > > + * routines in ext4_dio_write_end_io() are covered by the
> > > + * inode_lock().
> > > + */
> > > + if (ret == -EIOCBQUEUED && (unaligned_aio || extend))
> > > + inode_dio_wait(inode);
>
> So, I looked this more closely into the AIO DIO write extend case
> of yours here. AFAICT, this looks good in the sense that it follows
> the behavior what we used to have before from __blockdev_direct_IO.
> In that it used to wait for AIO DIO writes beyond EOF, but the iomap
> framework does not have that. So waiting in case of writes beyond EOF
> should be the right thing to do here for ext4 (following the legacy code).
>
> But I would like to confirm the exact race this extend case
> is protecting here.
> Since writes beyond EOF will require update of inode i_size
> (ext4_update_inode_size()) which require us to hold the inode_lock
> in exclusive mode, so we must need to wait in extend case here,
> even for AIO DIO writes.
>
> Q1. Is above understanding completely correct?
Yes, that's essentially correct.
> Q2. Or is there anything else also which it is also protecting which I am
> missing?
No, I think that's it...
> Do we need to hold inode exclusive lock for ext4_orphan_del() as well?
Yes, I believe so.
> Q3. How about XFS then?
> (I do see some tricks done with IOLOCK handling in case of ki__pos > i_size
> & to zero out the buffer space between old i_size & ki_pos).
>
> But if we talk only about the above case of extending AIO DIO writes beyond
> EOF, XFS only takes a shared lock. why?
>
> Looking into XFS code, I see that they have IOLOCK & ILOCK.
> So I guess for protecting inode->i_size update they use ILOCK (in
> xfs_dio_write_end_io() -> xfs_iomap_write_unwritten()
> or ip->i_flags_lock lock in NON-UNWRITTEN case). And for IO part the IOLOCK
> is used and hence IOLOCK can be used in shared mode. Is this correct
> understanding for XFS?
* David/Christoph/Darrick
I haven't looked at the intricate XFS locking semantics, so I can't really
comment until I've looked at the code to be perfectly honest. Perhaps asking
one of the XFS maintainers could get you the answer you're looking for on
this.
--<M>--
Powered by blists - more mailing lists