lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <Zti1Y5fthhgiL5Xb@shell.armlinux.org.uk> Date: Wed, 4 Sep 2024 20:30:43 +0100 From: "Russell King (Oracle)" <linux@...linux.org.uk> To: Theodore Ts'o <tytso@....edu>, Andreas Dilger <adilger.kernel@...ger.ca> Cc: linux-ext4@...r.kernel.org Subject: Re: BUG: 6.10: ext4 mpage_process_page_bufs() BUG_ON triggers On Wed, Sep 04, 2024 at 07:47:33PM +0100, Russell King (Oracle) wrote: > With a 6.10 based kernel, no changes to filesystem/MM code, I'm > seeing a reliable BUG_ON() within minutes of booting on one of my > VMs. I don't have a complete oops dump, but this is what I do > have, cobbled together from what was logged by journald, and > what syslogd was able to splat on the terminals before the VM > died. > > Sep 04 15:51:46 lists kernel: kernel BUG at fs/ext4/inode.c:1967! > > [ 1346.494848] Call trace: > [ 1346.495409] [<c04b4f90>] (mpage_process_page_bufs) from [<c04b938c>] (mpage_prepare_extent_to_map+0x410/0x51c) > [ 1346.499202] [<c04b938c>] (mpage_prepare_extent_to_map) from [<c04bbc40>] (ext4_do_writepages+0x320/0xb94) > [ 1346.502113] [<c04bbc40>] (ext4_do_writepages) from [<c04bc5dc>] (ext4_writepages+0xc0/0x1b4) > [ 1346.504662] [<c04bc5dc>] (ext4_writepages) from [<c0361154>] (do_writepages+0x68/0x220) > [ 1346.506974] [<c0361154>] (do_writepages) from [<c0354868>] (filemap_fdatawrite_wbc+0x64/0x84) > [ 1346.509165] [<c0354868>] (filemap_fdatawrite_wbc) from [<c035706c>] (__filemap_fdatawrite_range+0x50/0x58) > [ 1346.511414] [<c035706c>] (__filemap_fdatawrite_range) from [<c035709c>] (filemap_flush+0x28/0x30) > [ 1346.513518] [<c035709c>] (filemap_flush) from [<c04a8834>] (ext4_release_file+0x70/0xac) > [ 1346.515312] [<c04a8834>] (ext4_release_file) from [<c03f8088>] (__fput+0xd4/0x2cc) > [ 1346.517219] [<c03f8088>] (__fput) from [<c03f3e64>] (sys_close+0x28/0x5c) > [ 1346.518720] [<c03f3e64>] (sys_close) from [<c0200060>] (ret_fast_syscall+0x0/0x5c) > > From a quick look, I don't see any patches that touch fs/ext4/inode.c > that might address this. > > I'm not able to do any debugging, and from Friday, I suspect I won't > even be able to use a computer (due to operations on my eyes.) After rebooting the VM, the next oops was: Sep 04 19:33:41 lists kernel: Unable to handle kernel paging request at virtual address 5ed304f3 when read Sep 04 19:33:42 lists kernel: [5ed304f3] *pgd=80000040005003, *pmd=00000000 Sep 04 19:33:42 lists kernel: Internal error: Oops: 206 [#1] PREEMPT SMP ARM kernel:[ 205.583038] Internal error: Oops: 206 [#1] PREEMPT SMP ARM kernel:[ 205.630530] Process kworker/u4:2 (pid: 33, stack limit = 0xc68f8000) ... kernel:[ 205.661017] Call trace: kernel:[ 205.661997] [<c04d9060>] (ext4_finish_bio) from [<c04d931c>] (ext4_release_io_end+0x48/0xfc) kernel:[ 205.664523] [<c04d931c>] (ext4_release_io_end) from [<c04d94d8>] (ext4_end_io_rsv_work+0x88/0x188) kernel:[ 205.666628] [<c04d94d8>] (ext4_end_io_rsv_work) from [<c023f310>] (process_one_work+0x178/0x30c) kernel:[ 205.669924] [<c023f310>] (process_one_work) from [<c023fe48>] (worker_thread+0x25c/0x438) kernel:[ 205.671679] [<c023fe48>] (worker_thread) from [<c02480b0>] (kthread+0xfc/0x12c) kernel:[ 205.673607] [<c02480b0>] (kthread) from [<c020015c>] (ret_from_fork+0x14/0x38) kernel:[ 205.682719] Code: e1540005 0a00000d e5941008 e594201c (e5913000) This corresponds with: c04d9050: e1540005 cmp r4, r5 c04d9054: 0a00000d beq c04d9090 <ext4_finish_bio+0x208> c04d9058: e5941008 ldr r1, [r4, #8] c04d905c: e594201c ldr r2, [r4, #28] c04d9060: e5913000 ldr r3, [r1] ;<<<==== faulting instruction This code is: /* * We check all buffers in the folio under b_uptodate_lock * to avoid races with other end io clearing async_write flags */ spin_lock_irqsave(&head->b_uptodate_lock, flags); do { if (bh_offset(bh) < bio_start || bh_offset(bh) + bh->b_size > bio_end) { where r4 is "bh", r4+8 is the location of the bh->b_page pointer. static inline unsigned long bh_offset(const struct buffer_head *bh) { return (unsigned long)(bh)->b_data & (page_size(bh->b_page) - 1); } static inline unsigned long compound_nr(struct page *page) { struct folio *folio = (struct folio *)page; if (!test_bit(PG_head, &folio->flags)) return 1; where PG_head is bit 6. Thus, bh->b_page was corrupt. I've booted back into 6.7 on the offending VM, and it's stable, so it appears to be a regression between 6.7 and 6.10. -- *** please note that I probably will only be occasionally responsive *** for an unknown period of time due to recent eye surgery making *** reading quite difficult. RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
Powered by blists - more mailing lists