lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <692ab4aa-ff90-4b6f-980d-bfd6c1ca7619@huawei.com>
Date: Sat, 8 Mar 2025 09:09:28 +0800
From: Baokun Li <libaokun1@...wei.com>
To: Ojaswin Mujoo <ojaswin@...ux.ibm.com>
CC: <linux-ext4@...r.kernel.org>, Theodore Ts'o <tytso@....edu>, Jan Kara
	<jack@...e.cz>, <linux-kernel@...r.kernel.org>, Yang Erkun
	<yangerkun@...wei.com>
Subject: Re: [PATCH v2 0/3] Fix a BUG_ON crashing the kernel in
 start_this_handle

On 2025/3/6 22:28, Ojaswin Mujoo wrote:
> ** Changes since v1 [1] **
>
>   * Picked up RVBs from Jan and Ritesh
>   * In patch 2/3, we now use a flag in sbi instead of SB_ACITVE
>     to determine when to journal sb vs when to commit directly.
>   * Added a prep patch 1/3
>
> [1] https://lore.kernel.org/linux-ext4/cover.1740212945.git.ojaswin@linux.ibm.com/T/#m5e659425b8c8fe2ac01e7242b77fed315ff89db4
>
> @Baokun, I didn't get a chance to look into the journal_inode
> modifications we were discussing in [2]. I'll try to spend some time and
> send that as a separate patch. Hope that's okay
>
> [2] https://lore.kernel.org/linux-ext4/cover.1740212945.git.ojaswin@linux.ibm.com/T/#mad8feb44d9b6ddadf87830b92caa7b78d902dc05
>   
That's fine, it's not a priority. And if this patch set makes sure we
don't crash when things go wrong, I'm okay with leaving it as is.

It's possible that jbd2_journal_commit_transaction() could call
ext4_handle_error() in other places as the code evolves. Fixing known
problems and protecting against potential ones is always a good thing.


Cheers,
Baokun
> ** Original Cover **
>
> When running LTP stress tests on ext4, after a multiday run we seemed to
> have hit the following BUG_ON:
>
>   [NIP  : start_this_handle+268]
>   #3 [c000001067c27a40] start_this_handle at c008000004d40f74 [jbd2]  (unreliable)
>   #4 [c000001067c27b60] jbd2__journal_start at c008000004d415cc [jbd2]
>   #5 [c000001067c27be0] update_super_work at c0080000053f9758 [ext4]
>   #6 [c000001067c27c70] process_one_work at c000000000188790
>   #7 [c000001067c27d20] worker_thread at c00000000018973c
>   #8 [c000001067c27dc0] kthread at c000000000196c84
>   #9 [c000001067c27e10] ret_from_kernel_thread at c00000000000cd64
>
> Which comes out to
>
>    382   repeat:
>    383           read_lock(&journal->j_state_lock);
> * 384           BUG_ON(journal->j_flags & JBD2_UNMOUNT);
>    385           if (is_journal_aborted(journal) ||
>    386               (journal->j_errno != 0 && !(journal->j_flags & JBD2_ACK_ERR))) {
>    387                   read_unlock(&journal->j_state_lock);
>
>
> Initially this seemed like it should never happen but upon crash
> analysis it seems like it could indeed be hit as described in patch 1/2.
>
> I would like to add that through the logs we only knew that:
>
> - ext4_journal_bmap -> ext4_map_blocks is failing with EFSCORRUPTED.
> - update_super_work had hit the BUG_ON
>
> I was not able to hit this bug again (without modifying the kernel to
> inject errors) but the above backtrace seems to be one possible paths
> where this BUG_ON can be hit. Rest of the analysis and fix is in patch
> 2/3. Patch 3 is just a small tweak that i found helpful while debugging.
>
> That being said, journalling is something I'm not very familiar with and
> there might be gaps in my understanding so thoughts and suggestions are
> welcome.
>
> Ojaswin Mujoo (3):
>    ext4: define ext4_journal_destroy wrapper
>    ext4: avoid journaling sb update on error if journal is destroying
>    ext4: Make sb update interval tunable
>
>   fs/ext4/ext4.h      | 11 +++++++++++
>   fs/ext4/ext4_jbd2.h | 22 ++++++++++++++++++++++
>   fs/ext4/super.c     | 35 +++++++++++++++++------------------
>   fs/ext4/sysfs.c     |  4 ++++
>   4 files changed, 54 insertions(+), 18 deletions(-)
>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ