lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <537f33c2-ca19-4025-9d22-4ab44f92e8c0@huawei.com>
Date: Thu, 13 Mar 2025 10:08:48 +0800
From: Baokun Li <libaokun1@...wei.com>
To: Ojaswin Mujoo <ojaswin@...ux.ibm.com>, Zhang Yi <yi.zhang@...wei.com>, Jan
 Kara <jack@...e.cz>
CC: Ritesh Harjani <ritesh.list@...il.com>, <linux-ext4@...r.kernel.org>,
	Theodore Ts'o <tytso@....edu>, <linux-kernel@...r.kernel.org>, Mahesh Kumar
	<maheshkumar657g@...il.com>, Yang Erkun <yangerkun@...wei.com>
Subject: Re: [PATCH v2 2/3] ext4: avoid journaling sb update on error if
 journal is destroying

On 2025/3/13 9:20, Zhang Yi wrote:
> On 2025/3/13 1:15, Jan Kara wrote:
>> On Wed 12-03-25 19:56:36, Ojaswin Mujoo wrote:
>>> On Wed, Mar 12, 2025 at 11:51:03AM +0100, Jan Kara wrote:
>>>> On Mon 10-03-25 10:13:36, Ritesh Harjani wrote:
>>>>> Ojaswin Mujoo <ojaswin@...ux.ibm.com> writes:
>>>>>> On Sun, Mar 09, 2025 at 12:11:22AM +0530, Ritesh Harjani wrote:
>>>>>>> Ojaswin Mujoo <ojaswin@...ux.ibm.com> writes:
>>>>>>>> On Sat, Mar 08, 2025 at 06:56:23PM +0530, Ritesh Harjani wrote:
>>>>>>>>> Ojaswin Mujoo <ojaswin@...ux.ibm.com> writes:
>>>>>>>>>> On Sat, Mar 08, 2025 at 03:25:04PM +0530, Ritesh Harjani (IBM) wrote:
>>>>>>>>>>> Ojaswin Mujoo <ojaswin@...ux.ibm.com> writes:
>>>>>>>>>>>> Presently we always BUG_ON if trying to start a transaction on a journal marked
>>>>>>>>>>>> with JBD2_UNMOUNT, since this should never happen. However, while ltp running
>>>>>>>>>>>> stress tests, it was observed that in case of some error handling paths, it is
>>>>>>>>>>>> possible for update_super_work to start a transaction after the journal is
>>>>>>>>>>>> destroyed eg:
>>>>>>>>>>>>
>>>>>>>>>>>> (umount)
>>>>>>>>>>>> ext4_kill_sb
>>>>>>>>>>>>    kill_block_super
>>>>>>>>>>>>      generic_shutdown_super
>>>>>>>>>>>>        sync_filesystem /* commits all txns */
>>>>>>>>>>>>        evict_inodes
>>>>>>>>>>>>          /* might start a new txn */
>>>>>>>>>>>>        ext4_put_super
>>>>>>>>>>>> 	flush_work(&sbi->s_sb_upd_work) /* flush the workqueue */
>>>>>>>>>>>>          jbd2_journal_destroy
>>>>>>>>>>>>            journal_kill_thread
>>>>>>>>>>>>              journal->j_flags |= JBD2_UNMOUNT;
>>>>>>>>>>>>            jbd2_journal_commit_transaction
>>>>>>>>>>>>              jbd2_journal_get_descriptor_buffer
>>>>>>>>>>>>                jbd2_journal_bmap
>>>>>>>>>>>>                  ext4_journal_bmap
>>>>>>>>>>>>                    ext4_map_blocks
>>>>>>>>>>>>                      ...
>>>>>>>>>>>>                      ext4_inode_error
>>>>>>>>>>>>                        ext4_handle_error
>>>>>>>>>>>>                          schedule_work(&sbi->s_sb_upd_work)
>>>>>>>>>>>>
>>>>>>>>>>>>                                                 /* work queue kicks in */
>>>>>>>>>>>>                                                 update_super_work
>>>>>>>>>>>>                                                   jbd2_journal_start
>>>>>>>>>>>>                                                     start_this_handle
>>>>>>>>>>>>                                                       BUG_ON(journal->j_flags &
>>>>>>>>>>>>                                                              JBD2_UNMOUNT)
>>>>>>>>>>>>
>>>>>>>>>>>> Hence, introduce a new sbi flag s_journal_destroying to indicate journal is
>>>>>>>>>>>> destroying only do a journaled (and deferred) update of sb if this flag is not
>>>>>>>>>>>> set. Otherwise, just fallback to an un-journaled commit.
>>>>>>>>>>>>
>>>>>>>>>>>> We set sbi->s_journal_destroying = true only after all the FS updates are done
>>>>>>>>>>>> during ext4_put_super() (except a running transaction that will get commited
>>>>>>>>>>>> during jbd2_journal_destroy()). After this point, it is safe to commit the sb
>>>>>>>>>>>> outside the journal as it won't race with a journaled update (refer
>>>>>>>>>>>> 2d01ddc86606).
>>>>>>>>>>>>
>>>>>>>>>>>> Also, we don't need a similar check in ext4_grp_locked_error since it is only
>>>>>>>>>>>> called from mballoc and AFAICT it would be always valid to schedule work here.
>>>>>>>>>>>>
>>>>>>>>>>>> Fixes: 2d01ddc86606 ("ext4: save error info to sb through journal if available")
>>>>>>>>>>>> Reported-by: Mahesh Kumar <maheshkumar657g@...il.com>
>>>>>>>>>>>> Suggested-by: Jan Kara <jack@...e.cz>
>>>>>>>>>>>> Signed-off-by: Ojaswin Mujoo <ojaswin@...ux.ibm.com>
>>>>>>>>>>>> ---
>>>>>>>>>>>>   fs/ext4/ext4.h      | 2 ++
>>>>>>>>>>>>   fs/ext4/ext4_jbd2.h | 8 ++++++++
>>>>>>>>>>>>   fs/ext4/super.c     | 4 +++-
>>>>>>>>>>>>   3 files changed, 13 insertions(+), 1 deletion(-)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
>>>>>>>>>>>> index 2b7d781bfcad..d48e93bd5690 100644
>>>>>>>>>>>> --- a/fs/ext4/ext4.h
>>>>>>>>>>>> +++ b/fs/ext4/ext4.h
>>>>>>>>>>>> @@ -1728,6 +1728,8 @@ struct ext4_sb_info {
>>>>>>>>>>>>   	 */
>>>>>>>>>>>>   	struct work_struct s_sb_upd_work;
>>>>>>>>>>>>   
>>>>>>>>>>>> +	bool s_journal_destorying;
>>>>>>>>>>>> +
>>>>>>>>>>>>   	/* Atomic write unit values in bytes */
>>>>>>>>>>>>   	unsigned int s_awu_min;
>>>>>>>>>>>>   	unsigned int s_awu_max;
>>>>>>>>>>>> diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
>>>>>>>>>>>> index 9b3c9df02a39..6bd3ca84410d 100644
>>>>>>>>>>>> --- a/fs/ext4/ext4_jbd2.h
>>>>>>>>>>>> +++ b/fs/ext4/ext4_jbd2.h
>>>>>>>>>>>> @@ -437,6 +437,14 @@ static inline int ext4_journal_destroy(struct ext4_sb_info *sbi, journal_t *jour
>>>>>>>>>>>>   {
>>>>>>>>>>>>   	int err = 0;
>>>>>>>>>>>>   
>>>>>>>>>>>> +	/*
>>>>>>>>>>>> +	 * At this point all pending FS updates should be done except a possible
>>>>>>>>>>>> +	 * running transaction (which will commit in jbd2_journal_destroy). It
>>>>>>>>>>>> +	 * is now safe for any new errors to directly commit superblock rather
>>>>>>>>>>>> +	 * than going via journal.
>>>>>>>>>>>> +	 */
>>>>>>>>>>>> +	sbi->s_journal_destorying = true;
>>>>>>>>>>> This is not correct right. I think what we decided to set this flag
>>>>>>>>>>> before we flush the workqueue. So that we don't schedule any new
>>>>>>>>>>> work after this flag has been set. At least that is what I understood.
>>>>>>>>>>>
>>>>>>>>>>> [1]: https://lore.kernel.org/all/87eczc6rlt.fsf@gmail.com/
>>>>>>>>>>>
>>>>>>>>>>> -ritesh
>>>>>>>>>> Hey Ritesh,
>>>>>>>>>>
>>>>>>>>>> Yes that is not correct, I missed that in my patch however we realised
>>>>>>>>>> that adding it before flush_work() also has issues [1]. More
>>>>>>>>>> specifically:
>>>>>>>>> Ohk. right.
>>>>>>>>>
>>>>>>>>>>                       **kjournald2**
>>>>>>>>>>                       jbd2_journal_commit_transaction()
>>>>>>>>>>                       ...
>>>>>>>>>>                       ext4_handle_error()
>>>>>>>>>>                          /* s_journal_destorying is not set */
>>>>>>>>>>                          if (journal && !s_journal_destorying)
>>>>>>>>> Then maybe we should not schedule another work to update the superblock
>>>>>>>>> via journalling, it the error itself occurred while were trying to
>>>>>>>>> commit the journal txn?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -ritesh
>>>>>>>> Hmm, ideally yes that should not happen, but how can we achieve that?
>>>>>>>> For example with the trace we saw:
>>>>>>>>
>>>>>>>>     **kjournald2**
>>>>>>>>     jbd2_journal_commit_transaction()
>>>>>>>>       jbd2_journal_get_descriptor_buffer
>>>>>>>>         jbd2_journal_bmap
>>>>>>>>           ext4_journal_bmap
>>>>>>>>             ext4_map_blocks
>>>>>>>>               ...
>>>>>>>>               ext4_inode_error
>>>>>>>>                 ext4_handle_error
>>>>>>>>                   schedule_work(&sbi->s_sb_upd_work)
>>>>>>>>
>>>>>>>> How do we tell ext4_handle_error that it is in the context of a
>>>>>>>> committing txn.
>>>> So I was thinking about this. It is not a problem to determine we are
>>>> running in kjournald context - it is enough to check
>>>>
>>>> 	current == EXT4_SB(sb)->s_journal->j_task
>>> Oh, right :)
>>>
>>>> But I'm not sure checking this in ext4_handle_error() and doing direct sb
>>>> update instead of scheduling a journalled one is always correct. For
>>>> example kjournald does also writeback of ordered data and if that hits an
>>>> error, we do not necessarily abort the journal (well, currently we do as
>>>> far as I'm checking but it seems a bit fragile to rely on this).
>>> Okay so IIUC your concern is there might be some codepaths, now or in
>>> the future, where kjournald might call the FS layer, hit an error and
>>> still decide to not abort. In which case we would still want to update
>>> the sb via journal.
>> Yeah. The reason why I'm a bit concerned about it is mostly the case of
>> kjournald also handling ordered data and situations like
>> !(journal->j_flags & JBD2_ABORT_ON_SYNCDATA_ERR) where people want to
>> continue although ordered data had issues. Or situations where something in
>> j_commit_callback or another jbd2 hook ends up calling ext4_error()...
>>
> Ha, right! This is a case where kjournald triggers an ext4 error but does
> not abort the journal for now, I forgot this one, and there may be more.
> Thanks for pointing it out. I would also prefer to use this solution of
> adding ext4_journal_destory().
>
If we consider the possibility that there might be calls to ext4_error()
without aborting the journal, although I cannot imagine how this might
happen, this situation may indeed appear in hidden corners now or in the
future. Therefore, an extra flag is indeed needed, with which we don't
have to think so much. 😀


Cheers,
Baokun


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ