linux-ext4 - Re: [PATCH v3 2/6] ext4: for committing inode, make ext4_fc_track

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20220519161133.jrsdapiyizzywkeo@quack3.lan>
Date:   Thu, 19 May 2022 18:11:33 +0200
From:   Jan Kara <jack@...e.cz>
To:     harshad shirwadkar <harshadshirwadkar@...il.com>
Cc:     Jan Kara <jack@...e.cz>,
        Ext4 Developers List <linux-ext4@...r.kernel.org>,
        Ritesh Harjani <riteshh@...ux.ibm.com>,
        "Theodore Y. Ts'o" <tytso@....edu>
Subject: Re: [PATCH v3 2/6] ext4: for committing inode, make
 ext4_fc_track_inode wait

On Thu 19-05-22 07:28:11, harshad shirwadkar wrote:
> On Wed, 27 Apr 2022 at 08:50, Jan Kara <jack@...e.cz> wrote:
> >
> > On Tue 19-04-22 10:31:39, Harshad Shirwadkar wrote:
> > > From: Harshad Shirwadkar <harshadshirwadkar@...il.com>
> > >
> > > If the inode that's being requested to track using ext4_fc_track_inode
> > > is being committed, then wait until the inode finishes the
> > > commit. Also, add calls to ext4_fc_track_inode at the right places.
> > >
> > > With this patch, now calling ext4_reserve_inode_write() results in
> > > inode being tracked for next fast commit. A subtle lock ordering
> > > requirement with i_data_sem (which is documented in the code) requires
> > > that ext4_fc_track_inode() be called before grabbing i_data_sem. So,
> > > this patch also adds explicit ext4_fc_track_inode() calls in places
> > > where i_data_sem grabbed.
> > >
> > > Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@...il.com>
> > > ---
> > >  fs/ext4/fast_commit.c | 38 ++++++++++++++++++++++++++++++++++++++
> > >  fs/ext4/inline.c      |  3 +++
> > >  fs/ext4/inode.c       |  5 ++++-
> > >  3 files changed, 45 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
> > > index c278060a15bc..55f4c5ddd8e5 100644
> > > --- a/fs/ext4/fast_commit.c
> > > +++ b/fs/ext4/fast_commit.c
> > > +     /*
> > > +      * If we come here, we may sleep while waiting for the inode to
> > > +      * commit. We shouldn't be holding i_data_sem in write mode when we go
> > > +      * to sleep since the commit path needs to grab the lock while
> > > +      * committing the inode.
> > > +      */
> > > +     WARN_ON(lockdep_is_held_type(&ei->i_data_sem, 1));
> >
> > Note that we can deadlock even if we had i_data_sem for reading because
> > another reader is not allowed to get the rwsem if there is writer waiting
> > for it. So we need to check even that case here.
> I turned the above WARN_ON to check if data_sem is held in either read
> or write mode and now I am seeing many other places where data_sem
> gets grabbed in read mode before calling ext4_fc_track_inode().

Hum, that's unpleasant. Which places BTW? I'd expect this mostly happens in
ext4_map_blocks() paths. Anywhere else?

> We either need to call ext4_fc_track_inode() before all
> ext4_reserve_inode_write() in all those cases, or ensure that places that
> acquire in data_sem in write mode, wait if there's an ongoing commit and
> only then lock data_sem.

Neither is particularly appealing I guess. As we discussed on the call,
probably using extent status tree for the fastcommit code might be a
cleaner option.

> > > +     while (ext4_test_inode_state(inode, EXT4_STATE_FC_COMMITTING)) {
> > > +#if (BITS_PER_LONG < 64)
> > > +             DEFINE_WAIT_BIT(wait, &ei->i_state_flags,
> > > +                             EXT4_STATE_FC_COMMITTING);
> > > +             wq = bit_waitqueue(&ei->i_state_flags,
> > > +                                EXT4_STATE_FC_COMMITTING);
> > > +#else
> > > +             DEFINE_WAIT_BIT(wait, &ei->i_flags,
> > > +                             EXT4_STATE_FC_COMMITTING);
> > > +             wq = bit_waitqueue(&ei->i_flags,
> > > +                                EXT4_STATE_FC_COMMITTING);
> > > +#endif
> > > +             prepare_to_wait(wq, &wait.wq_entry, TASK_UNINTERRUPTIBLE);
> > > +             if (ext4_test_inode_state(inode, EXT4_STATE_FC_COMMITTING))
> > > +                     schedule();
> > > +             finish_wait(wq, &wait.wq_entry);
> > > +     }
> > > +
> > >       ret = ext4_fc_track_template(handle, inode, __track_inode, NULL, 1);
> > >       trace_ext4_fc_track_inode(handle, inode, ret);
> >
> > As we discussed in the call we should tell lockdep that this is equivalent
> > to lock+unlock of let's say fc_committing_lock and the fastcommit code
> > setting / clearing EXT4_STATE_FC_COMMITTING is equivalent to lock / unlock
> > of fc_committing_lock. That way we get proper lockdep tracking of this
> > waiting primitive.
> Sure, so you mean just adding __acquires() / __releases() annotations
> in these places right?

No. __acquires() and __releases() are sparse annotations. Sparse does also
some lock checking but it is a static checker and is pretty trivial. Here you
need to instrument lockdep. We do similar thing in jbd2 to tell lockdep
that starting a transaction handle effectively behaves as a lock - see the
rwsem_acquire_read() and rwsem_release() in start_this_handle() and
stop_this_handle(), respectively.

								Honza
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR