linux-kernel - Crash with PREEMPT

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20221103115444.m2rjglbkubydidts@quack3>
Date:   Thu, 3 Nov 2022 12:54:44 +0100
From:   Jan Kara <jack@...e.cz>
To:     LKML <linux-kernel@...r.kernel.org>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        Steven Rostedt <rostedt@...dmis.org>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        Mel Gorman <mgorman@...e.de>
Subject: Crash with PREEMPT_RT on aarch64 machine

Hello,

I was tracking down the following crash with 6.0 kernel with
patch-6.0.5-rt14.patch applied:

[ T6611] ------------[ cut here ]------------
[ T6611] kernel BUG at fs/inode.c:625!
[ T6611] Internal error: Oops - BUG: 0 [#1] PREEMPT_RT SMP
[ T6611] Modules linked in: xfs(E) af_packet(E) iscsi_ibft(E) iscsi_boot_sysfs(E) rfkill(E) mlx5_ib(E) ib_uverbs(E) ib_core(E) arm_spe_pmu(E) mlx5_core(E) sunrpc(E) mlxfw(E) pci_hyperv_intf(E) nls_iso8859_1(E) acpi_ipmi(E) nls_cp437(E) ipmi_ssif(E) vfat(E) ipmi_devintf(E) tls(E) igb(E) psample(E) button(E) arm_cmn(E) arm_dmc620_pmu(E) ipmi_msghandler(E) fat(E) cppc_cpufreq(E) arm_dsu_pmu(E) fuse(E) ip_tables(E) x_tables(E) ast(E) i2c_algo_bit(E) drm_vram_helper(E) aes_ce_blk(E) aes_ce_cipher(E) crct10dif_ce(E) ghash_ce(E) gf128mul(E) nvme(E) drm_kms_helper(E) sha2_ce(E) syscopyarea(E) sha256_arm64(E) sysfillrect(E) xhci_pci(E) sha1_ce(E) sysimgblt(E) nvme_core(E) xhci_pci_renesas(E) fb_sys_fops(E) nvme_common(E) drm_ttm_helper(E) sbsa_gwdt(E) t10_pi(E) ttm(E) xhci_hcd(E) crc64_rocksoft_generic(E) crc64_rocksoft(E) usbcore(E) crc64(E) drm(E) usb_common(E) i2c_designware_platform(E) i2c_designware_core(E) btrfs(E) blake2b_generic(E) libcrc32c(E) xor(E) xor_neon(E)
[ T6611]  raid6_pq(E) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E) scsi_common(E)
[ T6611] CPU: 11 PID: 6611 Comm: dbench Tainted: G            E   6.0.0-rt14-rt+ #1 4a18df02c109f1e703cf2ff86b77cf9cd9d5a188
[ T6611] Hardware name: GIGABYTE R272-P30-JG/MP32-AR0-JG, BIOS F16f (SCP: 1.06.20210615) 07/01/2021
[ T6611] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ T6611] pc : clear_inode+0xa0/0xc0
[ T6611] lr : clear_inode+0x38/0xc0
[ T6611] sp : ffff80000f4f3cd0
[ T6611] x29: ffff80000f4f3cd0 x28: ffff07ff92142000 x27: 0000000000000000
[ T6611] x26: ffff08012aef6058 x25: 0000000000000002 x24: ffffb657395e8000
[ T6611] x23: ffffb65739072008 x22: ffffb656e0bed0a8 x21: ffff08012aef6190
[ T6611] x20: ffff08012aef61f8 x19: ffff08012aef6058 x18: 0000000000000014
[ T6611] x17: 00000000f0d86255 x16: ffffb65737dfdb00 x15: 0100000004000000
[ T6611] x14: 644d000008090000 x13: 644d000008090000 x12: ffff80000f4f3b20
[ T6611] x11: 0000000000000002 x10: ffff083f5ffbe1c0 x9 : ffffb657388284a4
[ T6611] x8 : fffffffffffffffe x7 : ffff80000f4f3b20 x6 : ffff80000f4f3b20
[ T6611] x5 : ffff08012aef6210 x4 : ffff08012aef6210 x3 : 0000000000000000
[ T6611] x2 : ffff08012aef62d8 x1 : ffff07ff8fbbf690 x0 : ffff08012aef61a0
[ T6611] Call trace:
[ T6611]  clear_inode+0xa0/0xc0
[ T6611]  evict+0x160/0x180
[ T6611]  iput+0x154/0x240
[ T6611]  do_unlinkat+0x184/0x300
[ T6611]  __arm64_sys_unlinkat+0x48/0xc0
[ T6611]  el0_svc_common.constprop.4+0xe4/0x2c0
[ T6611]  do_el0_svc+0xac/0x100
[ T6611]  el0_svc+0x78/0x200
[ T6611]  el0t_64_sync_handler+0x9c/0xc0
[ T6611]  el0t_64_sync+0x19c/0x1a0
[ T6611] Code: d4210000 d503201f d4210000 d503201f (d4210000) 
[ T6611] ---[ end trace 0000000000000000 ]---

The machine is aarch64 architecture, kernel config is attached. I have seen
the crashes also with 5.14-rt kernel so it is not a new thing. The crash is
triggered relatively reliably (on two different aarch64 machines) by our
performance testing framework when running dbench benchmark against an XFS
filesystem.

Now originally I thought this is some problem with XFS or writeback code
but after debugging this for some time I don't think that anymore.
clear_inode() complains about inode->i_wb_list being non-empty. In fact
looking at the list_head, I can see it is corrupted. In all the occurences
of the problem ->prev points back to the list_head itself but ->next points
to some list_head that used to be part of the sb->s_inodes_wb list (or
actually that list spliced in wait_sb_inodes() because I've seen a pointer to
the stack as ->next pointer as well).

This is not just some memory ordering issue with the check in
clear_inode(). If I add sb->s_inode_wblist_lock locking around the check in
clear_inode(), the problem still reproduces.

If I enable CONFIG_DEBUG_LIST or if I convert sb->s_inode_wblist_lock to
raw_spinlock_t, the problem disappears.

Finally, I'd note that the list is modified from three places which makes
audit relatively simple. sb_mark_inode_writeback(),
sb_clear_inode_writeback(), and wait_sb_inodes(). All these places hold
sb->s_inode_wblist_lock when modifying the list. So at this point I'm at
loss what could be causing this. As unlikely as it seems to me I've started
wondering whether it is not some subtle issue with RT spinlocks on aarch64
possibly in combination with interrupts (because sb_clear_inode_writeback()
may be called from an interrupt).

Any ideas?

								Honza

-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR

Download attachment ".config.gz" of type "application/x-gzip" (42126 bytes)