[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <23b5782b-16bb-4efb-b55b-7766b97db27c@huawei.com>
Date: Sat, 8 Mar 2025 18:10:14 +0800
From: Baokun Li <libaokun1@...wei.com>
To: Ojaswin Mujoo <ojaswin@...ux.ibm.com>, Zhang Yi <yi.zhang@...weicloud.com>
CC: Jan Kara <jack@...e.cz>, <linux-kernel@...r.kernel.org>, Mahesh Kumar
<maheshkumar657g@...il.com>, <linux-ext4@...r.kernel.org>, Theodore Ts'o
<tytso@....edu>
Subject: Re: [PATCH v2 2/3] ext4: avoid journaling sb update on error if
journal is destroying
On 2025/3/8 17:58, Ojaswin Mujoo wrote:
> On Sat, Mar 08, 2025 at 01:48:44PM +0530, Ojaswin Mujoo wrote:
>> On Sat, Mar 08, 2025 at 10:57:16AM +0800, Zhang Yi wrote:
>>> On 2025/3/8 1:26, Ojaswin Mujoo wrote:
>>>> On Fri, Mar 07, 2025 at 08:36:08PM +0800, Zhang Yi wrote:
>>>>> On 2025/3/7 18:27, Ojaswin Mujoo wrote:
>>>>>> On Fri, Mar 07, 2025 at 04:43:24PM +0800, Zhang Yi wrote:
>>>>>>> On 2025/3/7 16:13, Ojaswin Mujoo wrote:
>>>>>>>> On Fri, Mar 07, 2025 at 12:04:26PM +0530, Ojaswin Mujoo wrote:
>>>>>>>>> On Fri, Mar 07, 2025 at 10:49:28AM +0800, Zhang Yi wrote:
>>>>>>>>>> On 2025/3/6 22:28, Ojaswin Mujoo wrote:
>>>>>>>>>>> Presently we always BUG_ON if trying to start a transaction on a journal marked
>>>>>>>>>>> with JBD2_UNMOUNT, since this should never happen. However, while ltp running
>>>>>>>>>>> stress tests, it was observed that in case of some error handling paths, it is
>>>>>>>>>>> possible for update_super_work to start a transaction after the journal is
>>>>>>>>>>> destroyed eg:
>>>>>>>>>>>
>>>>>>>>>>> (umount)
>>>>>>>>>>> ext4_kill_sb
>>>>>>>>>>> kill_block_super
>>>>>>>>>>> generic_shutdown_super
>>>>>>>>>>> sync_filesystem /* commits all txns */
>>>>>>>>>>> evict_inodes
>>>>>>>>>>> /* might start a new txn */
>>>>>>>>>>> ext4_put_super
>>>>>>>>>>> flush_work(&sbi->s_sb_upd_work) /* flush the workqueue */
>>>>>>>>>>> jbd2_journal_destroy
>>>>>>>>>>> journal_kill_thread
>>>>>>>>>>> journal->j_flags |= JBD2_UNMOUNT;
>>>>>>>>>>> jbd2_journal_commit_transaction
>>>>>>>>>>> jbd2_journal_get_descriptor_buffer
>>>>>>>>>>> jbd2_journal_bmap
>>>>>>>>>>> ext4_journal_bmap
>>>>>>>>>>> ext4_map_blocks
>>>>>>>>>>> ...
>>>>>>>>>>> ext4_inode_error
>>>>>>>>>>> ext4_handle_error
>>>>>>>>>>> schedule_work(&sbi->s_sb_upd_work)
>>>>>>>>>>>
>>>>>>>>>>> /* work queue kicks in */
>>>>>>>>>>> update_super_work
>>>>>>>>>>> jbd2_journal_start
>>>>>>>>>>> start_this_handle
>>>>>>>>>>> BUG_ON(journal->j_flags &
>>>>>>>>>>> JBD2_UNMOUNT)
>>>>>>>>>>>
>>>>>>>>>>> Hence, introduce a new sbi flag s_journal_destroying to indicate journal is
>>>>>>>>>>> destroying only do a journaled (and deferred) update of sb if this flag is not
>>>>>>>>>>> set. Otherwise, just fallback to an un-journaled commit.
>>>>>>>>>>>
>>>>>>>>>>> We set sbi->s_journal_destroying = true only after all the FS updates are done
>>>>>>>>>>> during ext4_put_super() (except a running transaction that will get commited
>>>>>>>>>>> during jbd2_journal_destroy()). After this point, it is safe to commit the sb
>>>>>>>>>>> outside the journal as it won't race with a journaled update (refer
>>>>>>>>>>> 2d01ddc86606).
>>>>>>>>>>>
>>>>>>>>>>> Also, we don't need a similar check in ext4_grp_locked_error since it is only
>>>>>>>>>>> called from mballoc and AFAICT it would be always valid to schedule work here.
>>>>>>>>>>>
>>>>>>>>>>> Fixes: 2d01ddc86606 ("ext4: save error info to sb through journal if available")
>>>>>>>>>>> Reported-by: Mahesh Kumar <maheshkumar657g@...il.com>
>>>>>>>>>>> Suggested-by: Jan Kara <jack@...e.cz>
>>>>>>>>>>> Signed-off-by: Ojaswin Mujoo <ojaswin@...ux.ibm.com>
>>>>>>>>>>> ---
>>>>>>>>>>> fs/ext4/ext4.h | 2 ++
>>>>>>>>>>> fs/ext4/ext4_jbd2.h | 8 ++++++++
>>>>>>>>>>> fs/ext4/super.c | 4 +++-
>>>>>>>>>>> 3 files changed, 13 insertions(+), 1 deletion(-)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
>>>>>>>>>>> index 2b7d781bfcad..d48e93bd5690 100644
>>>>>>>>>>> --- a/fs/ext4/ext4.h
>>>>>>>>>>> +++ b/fs/ext4/ext4.h
>>>>>>>>>>> @@ -1728,6 +1728,8 @@ struct ext4_sb_info {
>>>>>>>>>>> */
>>>>>>>>>>> struct work_struct s_sb_upd_work;
>>>>>>>>>>>
>>>>>>>>>>> + bool s_journal_destorying;
>>>>>>>>>>> +
>>>>>>>>>>> /* Atomic write unit values in bytes */
>>>>>>>>>>> unsigned int s_awu_min;
>>>>>>>>>>> unsigned int s_awu_max;
>>>>>>>>>>> diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
>>>>>>>>>>> index 9b3c9df02a39..6bd3ca84410d 100644
>>>>>>>>>>> --- a/fs/ext4/ext4_jbd2.h
>>>>>>>>>>> +++ b/fs/ext4/ext4_jbd2.h
>>>>>>>>>>> @@ -437,6 +437,14 @@ static inline int ext4_journal_destroy(struct ext4_sb_info *sbi, journal_t *jour
>>>>>>>>>>> {
>>>>>>>>>>> int err = 0;
>>>>>>>>>>>
>>>>>>>>>>> + /*
>>>>>>>>>>> + * At this point all pending FS updates should be done except a possible
>>>>>>>>>>> + * running transaction (which will commit in jbd2_journal_destroy). It
>>>>>>>>>>> + * is now safe for any new errors to directly commit superblock rather
>>>>>>>>>>> + * than going via journal.
>>>>>>>>>>> + */
>>>>>>>>>>> + sbi->s_journal_destorying = true;
>>>>>>>>>>> +
>>>>>>>>>> Hi, Ojaswin!
>>>>>>>>>>
>>>>>>>>>> I'm afraid you still need to flush the superblock update work here,
>>>>>>>>>> otherwise I guess the race condition you mentioned in v1 could still
>>>>>>>>>> occur.
>>>>>>>>>>
>>>>>>>>>> ext4_put_super()
>>>>>>>>>> flush_work(&sbi->s_sb_upd_work)
>>>>>>>>>>
>>>>>>>>>> **kjournald2**
>>>>>>>>>> jbd2_journal_commit_transaction()
>>>>>>>>>> ...
>>>>>>>>>> ext4_inode_error()
>>>>>>>>>> /* JBD2_UNMOUNT not set */
>>>>>>>>>> schedule_work(s_sb_upd_work)
>>>>>>>>>>
>>>>>>>>>> **workqueue**
>>>>>>>>>> update_super_work
>>>>>>>>>> /* s_journal_destorying is not set */
>>>>>>>>>> if (journal && !s_journal_destorying)
>>>>>>>>>>
>>>>>>>>>> ext4_journal_destroy()
>>>>>>>>>> /* set s_journal_destorying */
>>>>>>>>>> sbi->s_journal_destorying = true;
>>>>>>>>>> jbd2_journal_destroy()
>>>>>>>>>> journal->j_flags |= JBD2_UNMOUNT;
>>>>>>>>>>
>>>>>>>>>> jbd2_journal_start()
>>>>>>>>>> start_this_handle()
>>>>>>>>>> BUG_ON(JBD2_UNMOUNT)
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Yi.
>>>>>>>>> Hi Yi,
>>>>>>>>>
>>>>>>>>> Yes you are right, somehow missed this edge case :(
>>>>>>>>>
>>>>>>>>> Alright then, we have to move out sbi->s_journal_destroying outside the
>>>>>>>>> helper. Just wondering if I should still let it be in
>>>>>>>>> ext4_journal_destroy and just add an extra s_journal_destroying = false
>>>>>>>>> before schedule_work(s_sb_upd_work), because it makes sense.
>>>>>>>>>
>>>>>>>>> Okay let me give it some thought but thanks for pointing this out!
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> ojaswin
>>>>>>>> Okay so thinking about it a bit more, I see you also suggested to flush
>>>>>>>> the work after marking sbi->s_journal_destroying. But will that solve
>>>>>>>> it?
>>>>>>>>
>>>>>>>> ext4_put_super()
>>>>>>>> flush_work(&sbi->s_sb_upd_work)
>>>>>>>>
>>>>>>>> **kjournald2**
>>>>>>>> jbd2_journal_commit_transaction()
>>>>>>>> ...
>>>>>>>> ext4_inode_error()
>>>>>>>> /* JBD2_UNMOUNT not set */
>>>>>>>> schedule_work(s_sb_upd_work)
>>>>>>>>
>>>>>>>> **workqueue**
>>>>>>>> update_super_work
>>>>>>>> /* s_journal_destorying is not set */
>>>>>>>> if (journal && !s_journal_destorying)
>>>>>>>>
>>>>>>>> ext4_journal_destroy()
>>>>>>>> /* set s_journal_destorying */
>>>>>>>> sbi->s_journal_destorying = true;
>>>>>>>> flush_work(&sbi->s_sb_upd_work)
>>>>>>>> schedule_work()
>>>>>>> ^^^^^^^^^^^^^^^
>>>>>>> where does this come from?
>>>>>>>
>>>>>>> After this flush_work, we can guarantee that the running s_sb_upd_work
>>>>>>> finishes before we set JBD2_UNMOUNT. Additionally, the journal will
>>>>>>> not commit transaction or call schedule_work() again because it has
>>>>>>> been aborted due to the previous error. Am I missing something?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Yi.
>>>>>> Hmm, so I am thinking of a corner case in ext4_handle_error() where
>>>>>>
>>>>>> if(journal && !is_journal_destroying)
>>>>>>
>>>>>> is computed but schedule_work() not called yet, which is possible cause
>>>>>> the cmp followed by jump is not atomic in nature. If the schedule_work
>>>>>> is only called after we have done the flush then we end up with this:
>>>>>>
>>>>>> if (journal && !s_journal_destorying)
>>>>>> ext4_journal_destroy()
>>>>>> /* set s_journal_destorying */
>>>>>> sbi->s_journal_destorying = true;
>>>>>> flush_work(&sbi->s_sb_upd_work)
>>>>>> schedule_work()
>>>>>>
>>>>>> Which is possible IMO, although the window is tiny.
>>>>> Yeah, right!
>>>>> Sorry for misread the location where you add the "!s_journal_destorying"
>>>>> check, the graph I provided was in update_super_work(), which was wrong.
>>>> Oh right, I also misread your trace but yes as discussed, even
>>>>
>>>> sbi->s_journal_destorying = true;
>>>> flush_work()
>>>> jbd2_journal_destroy()
>>>>
>>>> doesn't work.
>>>>
>>>>> The right one should be:
>>>>>
>>>>> ext4_put_super()
>>>>> flush_work(&sbi->s_sb_upd_work)
>>>>>
>>>>> **kjournald2**
>>>>> jbd2_journal_commit_transaction()
>>>>> ...
>>>>> ext4_inode_error()
>>>>> /* s_journal_destorying is not set */
>>>>> if (journal && !s_journal_destorying)
>>>>> (schedule_work(s_sb_upd_work)) //can be here
>>>>>
>>>>> ext4_journal_destroy()
>>>>> /* set s_journal_destorying */
>>>>> sbi->s_journal_destorying = true;
>>>>> jbd2_journal_destroy()
>>>>> journal->j_flags |= JBD2_UNMOUNT;
>>>>>
>>>>> (schedule_work(s_sb_upd_work)) //also can be here
>>>>>
>>>>> **workqueue**
>>>>> update_super_work()
>>>>> journal = sbi->s_journal //get journal
>>>>> kfree(journal)
>>>>> jbd2_journal_start(journal) //journal UAF
>>>>> start_this_handle()
>>>>> BUG_ON(JBD2_UNMOUNT) //bugon here
>>>>>
>>>>>
>>>>> So there are two problems here, the first one is the 'journal' UAF,
>>>>> the second one is triggering JBD2_UNMOUNT flag BUGON.
>>>> Indeed, there's a possible UAF here as well.
>>>>
>>>>>>>> As for the fix, how about we do something like this:
>>>>>>>>
>>>>>>>> ext4_put_super()
>>>>>>>>
>>>>>>>> flush_work(&sbi->s_sb_upd_work)
>>>>>>>> destroy_workqueue(sbi->rsv_conversion_wq);
>>>>>>>>
>>>>>>>> ext4_journal_destroy()
>>>>>>>> /* set s_journal_destorying */
>>>>>>>> sbi->s_journal_destorying = true;
>>>>>>>>
>>>>>>>> /* trigger a commit and wait for it to complete */
>>>>>>>>
>>>>>>>> flush_work(&sbi->s_sb_upd_work)
>>>>>>>>
>>>>>>>> jbd2_journal_destroy()
>>>>>>>> journal->j_flags |= JBD2_UNMOUNT;
>>>>>>>>
>>>>>>>> jbd2_journal_start()
>>>>>>>> start_this_handle()
>>>>>>>> BUG_ON(JBD2_UNMOUNT)
>>>>>>>>
>>>>>>>> Still giving this codepath some thought but seems like this might just
>>>>>>>> be enough to fix the race. Thoughts on this?
>>>>>>>>
>>>>> I think this solution should work, the forced commit and flush_work()
>>>>> should ensure that the last transaction is committed and that the
>>>>> potential work is done.
>>>>>
>>>>> Besides, the s_journal_destorying flag is set and check concurrently
>>>>> now, so we need WRITE_ONCE() and READ_ONCE() for it. Besides, what
>>>>> about adding a new flag into sbi->s_mount_state instead of adding
>>>>> new s_journal_destorying?
>>>> Right, that makes sence. I will incorporate these changes in the next
>>>> revision.
>>>>
>>> Think about this again, it seems that we no longer need the destroying
>>> flag. Because we force to commit and wait for the **last** transaction to
>>> complete, and the flush work should also ensure that the last sb_update
>>> work to complete. Regardless of whether it starts a new handle in the
>>> last update_super_work(), it will not commit since the journal should
>>> have aborted. What are your thoughts?
>>>
>>> ext4_put_super()
>>> flush_work(&sbi->s_sb_upd_work)
>>> destroy_workqueue(sbi->rsv_conversion_wq)
>>>
>>> ext4_journal_destroy()
>>> /* trigger a commit (it will commit the last trnasaction) */
>>>
>>> **kjournald2**
>>> jbd2_journal_commit_transaction()
>>> ...
>>> ext4_inode_error()
>>> schedule_work(s_sb_upd_work))
>>>
>>> **workqueue**
>>> update_super_work()
>>> jbd2_journal_start(journal)
>>> start_this_handle()
>>> //This new trans will
>>> //not be committed.
>>>
>>> jbd2_journal_abort()
>>>
>>> /* wait for it to complete */
>>>
>>> flush_work(&sbi->s_sb_upd_work)
>>> jbd2_journal_destroy()
>>> journal->j_flags |= JBD2_UNMOUNT;
>>> jbd2_journal_commit_transaction() //it will commit nothing
>>>
>>> Thanks,
>>> Yi.
>> Hi Yi,
>>
>> There's one more path for which we need the flag:
>>
>> ext4_journal_destroy()
>> /* trigger a commit (it will commit the last trnasaction) */
>>
>> **kjournald2**
>> jbd2_journal_commit_transaction()
>> journal->j_commit_callback()
>> ext4_journal_commit_callback()
>> ext4_maybe_update_superblock()
>> schedule_work()
>> /* start a transaction here */
>> flush_work()
>> jbd2_journal_destroy()
>> journal_kill_thread
>> flags |= JBD2_UNMOUNT
>> jbd2_journal_commit_transaction()
>> ...
>> ext4_inode_error()
>> schedule_work(s_sb_upd_work))
>> /* update_super_work_tries to start the txn */
>> BUG_ON()
> Oops the formatting is wrong, here's the trace:
>
> ext4_journal_destroy()
> /* trigger a commit (it will commit the last trnasaction) */
>
> **kjournald2**
> jbd2_journal_commit_transaction()
> journal->j_commit_callback()
> ext4_journal_commit_callback()
> ext4_maybe_update_superblock()
> schedule_work()
At this point, SB_ACTIVE should have been cleared,
so ext4_maybe_update_superblock() should do nothing.
With this in mind, it could be the case that an
additional flag is no longer needed.
Regards,
Baokun
>
> /* update_super_work starts a new txn here */
> flush_work()
> jbd2_journal_destroy()
> journal_kill_thread
> flags |= JBD2_UNMOUNT
> jbd2_journal_commit_transaction()
> ...
> ext4_inode_error()
> schedule_work(s_sb_upd_work))
> /* update_super_work_tries to start the txn */
> BUG_ON()
>
>> I think this to protect against this path we do need a flag.
>>
>> Regards,
>> ojaswin
Powered by blists - more mailing lists