linux-ext4 - Re: [PATCH 1/2] ext4: only defer sb update on error if SB

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z8BKdo5IAHJRdMkp@li-dc0c254c-257c-11b2-a85c-98b6c1322444.ibm.com>
Date: Thu, 27 Feb 2025 16:50:22 +0530
From: Ojaswin Mujoo <ojaswin@...ux.ibm.com>
To: Jan Kara <jack@...e.cz>
Cc: linux-ext4@...r.kernel.org, "Theodore Ts'o" <tytso@....edu>,
        linux-kernel@...r.kernel.org, Mahesh Kumar <maheshkumar657g@...il.com>,
        Ritesh Harjani <ritesh.list@...il.com>
Subject: Re: [PATCH 1/2] ext4: only defer sb update on error if SB_ACTIVE

On Mon, Feb 24, 2025 at 03:52:00PM +0100, Jan Kara wrote:
> On Sat 22-02-25 14:10:22, Ojaswin Mujoo wrote:
> > Presently we always BUG_ON if trying to start a transaction on a journal
> > marked with JBD2_UNMOUNT, since this should never happen. However while
> > running stress tests it was observed that in case of some error handling
> > paths, it is possible for update_super_work to start a transaction after
> > the journal is destroyed eg:
> > 
> > (umount)
> > ext4_kill_sb
> >   kill_block_super
> >     generic_shutdown_super
> >       sync_filesystem /* commits all txns */
> >       evict_inodes
> >         /* might start a new txn */
> >       ext4_put_super
> > 	flush_work(&sbi->s_sb_upd_work) /* flush the workqueue */
> >         jbd2_journal_destroy
> >           journal_kill_thread
> >             journal->j_flags |= JBD2_UNMOUNT;
> >           jbd2_journal_commit_transaction
> >             jbd2_journal_get_descriptor_buffer
> >               jbd2_journal_bmap
> >                 ext4_journal_bmap
> >                   ext4_map_blocks
> >                     ...
> >                     ext4_inode_error
> >                       ext4_handle_error
> >                         schedule_work(&sbi->s_sb_upd_work)
> > 
> >                                                /* work queue kicks in */
> >                                                update_super_work
> >                                                  jbd2_journal_start
> >                                                    start_this_handle
> >                                                      BUG_ON(journal->j_flags &
> >                                                             JBD2_UNMOUNT)
> > 
> > Hence, make sure we only defer the update of ext4 sb if the sb is still
> > active.  Otherwise, just fallback to an un-journaled commit.
> > 
> > The important thing to note here is that we must only defer sb update if
> > we have not yet flushed the s_sb_update_work queue in umount path else
> > this race can be hit (point 1 below). Since we don't have a direct way
> > to check for that we use SB_ACTIVE instead. The SB_ACTIVE check is a bit
> > subtle so adding some notes below for future reference:
> > 
> > 1. Ideally we would want to have a something like (flags & JBD2_UNMOUNT
> > == 0) however this is not correct since we could end up scheduling work
> > after it has been flushed:
> > 
> >  ext4_put_super
> >   flush_work(&sbi->s_sb_upd_work)
> > 
> >                            **kjournald2**
> >                            jbd2_journal_commit_transaction
> >                            ...
> >                            ext4_inode_error
> >                              /* JBD2_UNMOUNT not set */
> >                              schedule_work(s_sb_upd_work)
> > 
> >    jbd2_journal_destroy
> >     journal->j_flags |= JBD2_UNMOUNT;
> > 
> >                                       **workqueue**
> >                                       update_super_work
> >                                        jbd2_journal_start
> >                                         start_this_handle
> >                                           BUG_ON(JBD2_UNMOUNT)
> > 
> > Something like the above doesn't happen with SB_ACTIVE check because we
> > are sure that the workqueue would be flushed at a later point if we are
> > in the umount path.
> > 
> > 2. We don't need a similar check in ext4_grp_locked_error since it is
> > only called from mballoc and AFAICT it would be always valid to schedule
> > work here.
> > 
> > Fixes: 2d01ddc86606 ("ext4: save error info to sb through journal if available")
> > Reported-by: Mahesh Kumar <maheshkumar657g@...il.com>
> > Suggested-by: Ritesh Harjani <ritesh.list@...il.com>
> > Signed-off-by: Ojaswin Mujoo <ojaswin@...ux.ibm.com>
> 
> Good catch! But I think the solution will have to be slightly different.
> Basing the check on SB_ACTIVE has the problem that you can have racing
> updates of the sb in the still running transaction and in your direct
> update leading to inconsistencies after a crash (that was the reason why
> we've created the s_sb_upd_work in the first place).
> 
> I would solve this by implementing something like
> ext4_update_sb_destroy_journal() which will set a flag in sbi, flush the
> workqueue, and then destroy the journal. And ext4_handle_error() will check
> for the sbi flag.
> 
> 								Honza

Hey Jan,

Thanks for the review. So earlier I did go through different code paths to see
if we will have a direct sb write clash with a journalled one it wouldn't but,
relooking at it, seems like we might have a scenario as follows:

generic_super_shutdown
 sync_filesytems
  /* running txns committed. executing ext4_journal_commit_callback */
  ext4_maybe_update_superblock
   /* schedules work */
   schedule_work(&sbi->s_sb_upd_work)
                                          update_super_work
                                          /* start a txn and add sb to it */
 sb->s_flags &= ~SB_ACTIVE;
 evict_inode
   ext4_evict_inode
    ext4_std_error
     ext4_handle_error
      /* direct commit of sb (Not good!) */


Now with the 'setting the flag in sbi' approach, I'm not sure if that will be
enough to handle this as well. For example, if we add a flag like
sbi->s_journal_destroying, then:

ext4_put_super
 sbi->s_journal_destroying = true
 flush_workqueue()
  /* sb is now journalled */
 jbd2_journal_destory
  jbd2_journal_commit_transaction
   /* add tag for sb in descriptor and add buffer to wbufs[] */
   /* Later from some other buffer in the txn: */
   jbd2_journal_next_log_block
    /* hits error in ext4_journal_bmap */
    ext4_handle_error
      sbi->s_journal_destroying == true
      /* update and commit sb directly causing a checksum mismatch b/w entry in descriptor */
   jbd2_journal_abort
   /* after abort everything in wbufs[] is written to journal */

In the above we will have a checksum mismatch but then maybe its not really
an issue. Maybe since we never commit the txn it is understood that the contents
can't be trusted and it should be fine to have a mismatch b/w the decriptor tag
and the actual super block contents? In which case the sbi flag approach should
be fine.

Does my understanding sound correct?


Regards,
ojaswin
> 
> > ---
> >  fs/ext4/super.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> > index a963ffda692a..b7341e9acf62 100644
> > --- a/fs/ext4/super.c
> > +++ b/fs/ext4/super.c
> > @@ -706,7 +706,7 @@ static void ext4_handle_error(struct super_block *sb, bool force_ro, int error,
> >  		 * constraints, it may not be safe to do it right here so we
> >  		 * defer superblock flushing to a workqueue.
> >  		 */
> > -		if (continue_fs && journal)
> > +		if (continue_fs && journal && (sb->s_flags & SB_ACTIVE))
> >  			schedule_work(&EXT4_SB(sb)->s_sb_upd_work);
> >  		else
> >  			ext4_commit_super(sb);
> > -- 
> > 2.48.1
> > 
> -- 
> Jan Kara <jack@...e.com>
> SUSE Labs, CR