[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20220519161133.jrsdapiyizzywkeo@quack3.lan>
Date: Thu, 19 May 2022 18:11:33 +0200
From: Jan Kara <jack@...e.cz>
To: harshad shirwadkar <harshadshirwadkar@...il.com>
Cc: Jan Kara <jack@...e.cz>,
Ext4 Developers List <linux-ext4@...r.kernel.org>,
Ritesh Harjani <riteshh@...ux.ibm.com>,
"Theodore Y. Ts'o" <tytso@....edu>
Subject: Re: [PATCH v3 2/6] ext4: for committing inode, make
ext4_fc_track_inode wait
On Thu 19-05-22 07:28:11, harshad shirwadkar wrote:
> On Wed, 27 Apr 2022 at 08:50, Jan Kara <jack@...e.cz> wrote:
> >
> > On Tue 19-04-22 10:31:39, Harshad Shirwadkar wrote:
> > > From: Harshad Shirwadkar <harshadshirwadkar@...il.com>
> > >
> > > If the inode that's being requested to track using ext4_fc_track_inode
> > > is being committed, then wait until the inode finishes the
> > > commit. Also, add calls to ext4_fc_track_inode at the right places.
> > >
> > > With this patch, now calling ext4_reserve_inode_write() results in
> > > inode being tracked for next fast commit. A subtle lock ordering
> > > requirement with i_data_sem (which is documented in the code) requires
> > > that ext4_fc_track_inode() be called before grabbing i_data_sem. So,
> > > this patch also adds explicit ext4_fc_track_inode() calls in places
> > > where i_data_sem grabbed.
> > >
> > > Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@...il.com>
> > > ---
> > > fs/ext4/fast_commit.c | 38 ++++++++++++++++++++++++++++++++++++++
> > > fs/ext4/inline.c | 3 +++
> > > fs/ext4/inode.c | 5 ++++-
> > > 3 files changed, 45 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
> > > index c278060a15bc..55f4c5ddd8e5 100644
> > > --- a/fs/ext4/fast_commit.c
> > > +++ b/fs/ext4/fast_commit.c
> > > + /*
> > > + * If we come here, we may sleep while waiting for the inode to
> > > + * commit. We shouldn't be holding i_data_sem in write mode when we go
> > > + * to sleep since the commit path needs to grab the lock while
> > > + * committing the inode.
> > > + */
> > > + WARN_ON(lockdep_is_held_type(&ei->i_data_sem, 1));
> >
> > Note that we can deadlock even if we had i_data_sem for reading because
> > another reader is not allowed to get the rwsem if there is writer waiting
> > for it. So we need to check even that case here.
> I turned the above WARN_ON to check if data_sem is held in either read
> or write mode and now I am seeing many other places where data_sem
> gets grabbed in read mode before calling ext4_fc_track_inode().
Hum, that's unpleasant. Which places BTW? I'd expect this mostly happens in
ext4_map_blocks() paths. Anywhere else?
> We either need to call ext4_fc_track_inode() before all
> ext4_reserve_inode_write() in all those cases, or ensure that places that
> acquire in data_sem in write mode, wait if there's an ongoing commit and
> only then lock data_sem.
Neither is particularly appealing I guess. As we discussed on the call,
probably using extent status tree for the fastcommit code might be a
cleaner option.
> > > + while (ext4_test_inode_state(inode, EXT4_STATE_FC_COMMITTING)) {
> > > +#if (BITS_PER_LONG < 64)
> > > + DEFINE_WAIT_BIT(wait, &ei->i_state_flags,
> > > + EXT4_STATE_FC_COMMITTING);
> > > + wq = bit_waitqueue(&ei->i_state_flags,
> > > + EXT4_STATE_FC_COMMITTING);
> > > +#else
> > > + DEFINE_WAIT_BIT(wait, &ei->i_flags,
> > > + EXT4_STATE_FC_COMMITTING);
> > > + wq = bit_waitqueue(&ei->i_flags,
> > > + EXT4_STATE_FC_COMMITTING);
> > > +#endif
> > > + prepare_to_wait(wq, &wait.wq_entry, TASK_UNINTERRUPTIBLE);
> > > + if (ext4_test_inode_state(inode, EXT4_STATE_FC_COMMITTING))
> > > + schedule();
> > > + finish_wait(wq, &wait.wq_entry);
> > > + }
> > > +
> > > ret = ext4_fc_track_template(handle, inode, __track_inode, NULL, 1);
> > > trace_ext4_fc_track_inode(handle, inode, ret);
> >
> > As we discussed in the call we should tell lockdep that this is equivalent
> > to lock+unlock of let's say fc_committing_lock and the fastcommit code
> > setting / clearing EXT4_STATE_FC_COMMITTING is equivalent to lock / unlock
> > of fc_committing_lock. That way we get proper lockdep tracking of this
> > waiting primitive.
> Sure, so you mean just adding __acquires() / __releases() annotations
> in these places right?
No. __acquires() and __releases() are sparse annotations. Sparse does also
some lock checking but it is a static checker and is pretty trivial. Here you
need to instrument lockdep. We do similar thing in jbd2 to tell lockdep
that starting a transaction handle effectively behaves as a lock - see the
rwsem_acquire_read() and rwsem_release() in start_this_handle() and
stop_this_handle(), respectively.
Honza
--
Jan Kara <jack@...e.com>
SUSE Labs, CR
Powered by blists - more mailing lists