[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1033cd3b-e41f-e4e0-c2ee-c4b23979208a@huaweicloud.com>
Date: Wed, 14 Jun 2023 21:25:28 +0800
From: Zhang Yi <yi.zhang@...weicloud.com>
To: Theodore Ts'o <tytso@....edu>,
Zhihao Cheng <chengzhihao1@...wei.com>
Cc: linux-ext4@...r.kernel.org, adilger.kernel@...ger.ca, jack@...e.cz,
yi.zhang@...wei.com, yukuai3@...wei.com
Subject: Re: [PATCH v3 4/6] jbd2: Fix wrongly judgement for buffer head
removing while doing checkpoint
On 2023/6/14 13:42, Theodore Ts'o wrote:
> OK, some more updates. First of all, the e2fsck hang in the ext4/adv
> case is an inline_data bug in e2fsck/pass2.c:check_dir_block(); the
> code is clearly buggy, and I'll be sending out a fix in the next day
> or two.
>
> I still don't understand why this patch series is changing the kernel
> behaviour enough to change the resulting file system in such a way as
> to unmask this bug. The bug is triggered by file system corruption,
> so the question is whether this patch series is somehow causing the
> file system to be more corrupted than it otherwise would be. I'm not
> sure.
>
> However, the ext4/ext3 hang *is* a real hang in the kernel space, and
> generic/475 is not completing because the kernel seems to have ended
> up deadlocking somehow. With just the first patch in this patch
> series ("jbd2: recheck chechpointing non-dirty buffer") we're getting
> a kernel NULL pointer derefence:
>
> [ 310.447568] EXT4-fs error (device dm-7): ext4_check_bdev_write_error:223: comm fsstress: Error while async write back metadata
> [ 310.458038] EXT4-fs error (device dm-7): __ext4_get_inode_loc_noinmem:4467: inode #99400: block 393286: comm fsstress: unable to read itable block
> [ 310.458421] JBD2: IO error reading journal superblock
> [ 310.484755] EXT4-fs warning (device dm-7): ext4_end_bio:343: I/O error 10 writing to inode 36066 starting block 19083)
> [ 310.490956] BUG: kernel NULL pointer dereference, address: 0000000000000000
> [ 310.490959] #PF: supervisor write access in kernel mode
> [ 310.490961] #PF: error_code(0x0002) - not-present page
> [ 310.490963] PGD 0 P4D 0
> [ 310.490966] Oops: 0002 [#1] PREEMPT SMP PTI
> [ 310.490970] CPU: 1 PID: 15600 Comm: fsstress Not tainted 6.4.0-rc5-xfstests-00055-gd3ab1bca26b4 #190
> [ 310.490974] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/25/2023
> [ 310.490976] RIP: 0010:jbd2_journal_set_features+0x13d/0x430
> [ 310.490985] Code: 0f 94 c0 44 20 e8 0f 85 e0 00 00 00 be 94 01 00 00 48 c7 c7 a1 33 59 b4 48 89 0c 24 4c 8b 7d 38 e8 a8 dc c5 ff 2e 2e 2e 31 c0 <f0> 49 0f ba 2f 02 48 8b 0c 24 0f 82 24 02 00 00 45 84 ed 8b 41 28
> [ 310.490988] RSP: 0018:ffffb9b441043b30 EFLAGS: 00010246
> [ 310.490990] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff8edb447b8100
> [ 310.490993] RDX: 0000000000000000 RSI: 0000000000000194 RDI: ffffffffb45933a1
> [ 310.490994] RBP: ffff8edb45a62800 R08: ffffffffb460d6c0 R09: 0000000000000000
> [ 310.490996] R10: 204f49203a324442 R11: 4f49203a3244424a R12: 0000000000000000
> [ 310.490997] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000
> [ 310.490999] FS: 00007f2940cca740(0000) GS:ffff8edc19500000(0000) knlGS:0000000000000000
> [ 310.491005] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 310.491007] CR2: 0000000000000000 CR3: 000000012543e003 CR4: 00000000003706e0
> [ 310.491009] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 310.491011] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 310.491012] Call Trace:
> [ 310.491016] <TASK>
> [ 310.491019] ? __die+0x23/0x60
> [ 310.491025] ? page_fault_oops+0xa4/0x170
> [ 310.491029] ? exc_page_fault+0x67/0x170
> [ 310.491032] ? asm_exc_page_fault+0x26/0x30
> [ 310.491039] ? jbd2_journal_set_features+0x13d/0x430
> [ 310.491043] jbd2_journal_revoke+0x47/0x1e0
> [ 310.491046] __ext4_forget+0xc3/0x1b0
> [ 310.491051] ext4_free_blocks+0x214/0x2f0
> [ 310.491056] ext4_free_branches+0xeb/0x270
> [ 310.491061] ext4_ind_truncate+0x2bf/0x320
> [ 310.491065] ext4_truncate+0x1e4/0x490
> [ 310.491069] ext4_handle_inode_extension+0x1bd/0x2a0
> [ 310.491073] ? iomap_dio_complete+0xaf/0x1d0
> [ 310.511141] ------------[ cut here ]------------
> [ 310.516121] ext4_dio_write_iter+0x346/0x3e0
> [ 310.516132] ? __handle_mm_fault+0x171/0x200
> [ 310.516135] vfs_write+0x21a/0x3e0
> [ 310.516140] ksys_write+0x6f/0xf0
> [ 310.516142] do_syscall_64+0x3b/0x90
> [ 310.516147] entry_SYSCALL_64_after_hwframe+0x72/0xdc
> [ 310.516154] RIP: 0033:0x7f2940eb2fb3
> [ 310.516158] Code: 75 05 48 83 c4 58 c3 e8 cb 41 ff ff 66 2e 0f 1f 84 00 00 00 00 00 90 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 48 89 54 24 18
> [ 310.516161] RSP: 002b:00007ffe9a322cf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> [ 310.516165] RAX: ffffffffffffffda RBX: 0000000000003000 RCX: 00007f2940eb2fb3
> [ 310.516167] RDX: 0000000000003000 RSI: 0000556ba1e31000 RDI: 0000000000000003
> [ 310.516168] RBP: 0000000000000003 R08: 0000556ba1e31000 R09: 00007f2940e9bbe0
> [ 310.516170] R10: 0000556b9fedbf59 R11: 0000000000000246 R12: 0000000000000024
> [ 310.516172] R13: 00000000000cf000 R14: 0000556ba1e31000 R15: 0000000000000000
> [ 310.516174] </TASK>
> [ 310.516178] CR2: 0000000000000000
> [ 310.516181] ---[ end trace 0000000000000000 ]---
>
Sorry about the regression, I found that this issue is not introduced
by the first patch in this patch series ("jbd2: recheck chechpointing
non-dirty buffer"), is d9eafe0afafa ("jbd2: factor out journal
initialization from journal_get_superblock()") [1] on your dev branch.
The problem is the journal super block had been failed to write out
due to IO fault, it's uptodate bit was cleared by
end_buffer_write_syn() and didn't reset yet in jbd2_write_superblock().
And it raced by jbd2_journal_revoke()->jbd2_journal_set_features()->
jbd2_journal_check_used_features()->journal_get_superblock()->bh_read(),
unfortunately, the read IO is also fail, so the error handling in
journal_fail_superblock() clear the journal->j_sb_buffer, finally lead
to above NULL pointer dereference issue.
I think the fix could be just move buffer_verified(bh) in front of
bh_read(). I can send out the fix after tests.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git/commit/?h=dev&id=d9eafe0afafaa519953735498c2a065d223c519b
Thanks,
Yi.
> This is then causing fsstress to wedge:
>
> # ps -ax -o pid,user,wchan:20,args --sort pid
> PID USER WCHAN COMMAND
> ...
> 12860 root do_wait /bin/bash /root/xfstests/tests/generic/475
> 13086 root rescuer_thread [kdmflush/253:7]
> 15593 root rescuer_thread [ext4-rsv-conver]
> 15598 root jbd2_log_wait_commit ./ltp/fsstress -d /xt-vdc -n 999999 -p 4
> 15600 root ext4_release_file [fsstress]
> 15601 root exit_aio [fsstress]
>
> So at this point, I'm going to drop this entire patch series from the
> dev tree, since this *does* seem to be some kind of regression
> triggered by the first patch in the patch series.
>
> Regards,
>
> - Ted
>
Powered by blists - more mailing lists