linux-kernel - xfs lockup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <9b091e22-3599-973f-d740-c804f43c71ca@redhat.com>
Date: Mon, 27 Jan 2025 19:31:20 +0100 (CET)
From: Sebastian Ott <sebott@...hat.com>
To: linux-xfs@...r.kernel.org, linux-kernel@...r.kernel.org
cc: Carlos Maiolino <cem@...nel.org>, "Darrick J. Wong" <djwong@...nel.org>
Subject: xfs lockup 

Hej,

I just ran into a deadlock on arm - is this a known issue?

watchdog: BUG: soft lockup - CPU#27 stuck for 26s! [git:4023]
Modules linked in: tls qrtr rfkill sunrpc mlx5_ib ib_uverbs cdc_eem usbnet mii acpi_ipmi ib_core ipmi_ssif arm_spe_pmu ipmi_devintf ipmi_msghandler arm_cmn arm_dmc62>
irq event stamp: 1277
hardirqs last  enabled at (1277): [<ffff80007ca5d248>] xfs_buf_get_map+0xe70/0x1108 [xfs]
hardirqs last disabled at (1276): [<ffff80007ca5d1f4>] xfs_buf_get_map+0xe1c/0x1108 [xfs]
softirqs last  enabled at (36): [<ffff800080018f2c>] fpsimd_restore_current_state+0x3c/0xd8
softirqs last disabled at (34): [<ffff800080018efc>] fpsimd_restore_current_state+0xc/0xd8
CPU: 27 UID: 0 PID: 4023 Comm: git Not tainted 6.13.0+ #202
Hardware name: HPE ProLiant RL300 Gen11/ProLiant RL300 Gen11, BIOS 1.50 12/18/2023
pstate: 00400009 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : queued_spin_lock_slowpath+0x78/0x4f0
lr : do_raw_spin_lock+0xb4/0x128
sp : ffff8000cc133210
x29: ffff8000cc133210 x28: ffff800082d7a478 x27: ffff07ffe074fa60
x26: ffff07ffe074faa0 x25: ffff07ffe074f800 x24: ffff80007ca5db24
x23: ffff07ff92f0c200 x22: 0000000080000001 x21: ffff07ff92f0c288
x20: ffff80007ca5cb8c x19: ffff07ff92f0c288 x18: 00000000fffffffd
x17: 675f6675625f7366 x16: 78203a7461202c7d x15: ffff8000cc132668
x14: 0000000000000000 x13: ffff8000cc13285b x12: ffff087d3e9c0000
x11: 0000000000000001 x10: 0000000000000001 x9 : ffff80008017360c
x8 : c0000000ffff7fff x7 : ffff800083ceddb0 x6 : 00000000002bffa8
x5 : ffff087d3e9bffa8 x4 : ffff8000cc134000 x3 : ffff08011cd92ec0
x2 : 00000000dead4ead x1 : 0000000000000000 x0 : 0000000000000001
Call trace:
  queued_spin_lock_slowpath+0x78/0x4f0 (P)
  do_raw_spin_lock+0xb4/0x128
  _raw_spin_lock+0x58/0x70
  xfs_buf_get_map+0x7b4/0x1108 [xfs]
  xfs_buf_read_map+0x54/0x2f8 [xfs]
  xfs_trans_read_buf_map+0x1cc/0x510 [xfs]
  xfs_imap_to_bp+0x5c/0xc8 [xfs]
  xfs_iget+0x3dc/0x10f8 [xfs]
  xfs_lookup+0xf4/0x208 [xfs]
  xfs_vn_lookup+0x5c/0x98 [xfs]
  __lookup_slow+0xb4/0x168
  walk_component+0xe0/0x1a0
  path_lookupat+0x80/0x1b0
  filename_lookup+0xb0/0x1b8
  vfs_statx+0x74/0xd8
  vfs_fstatat+0x6c/0xe0
  __do_sys_newfstatat+0x48/0x78
  __arm64_sys_newfstatat+0x28/0x38


Lockdep also complained:

======================================================
WARNING: possible circular locking dependency detected
6.13.0+ #202 Not tainted
------------------------------------------------------
git/4023 is trying to acquire lock:
ffff07ff92f0c2a0 (&bp->b_lock){+.+.}-{3:3}, at: xfs_buf_get_map+0x7b4/0x1108 [xfs]

but task is already holding lock:
ffff07ffe074fa78 (&bch->bc_lock){+.+.}-{3:3}, at: xfs_buf_get_map+0x5c4/0x1108 [xfs]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&bch->bc_lock){+.+.}-{3:3}:
        _raw_spin_lock+0x50/0x70
        xfs_buf_rele+0x140/0xa70 [xfs]
        xfs_trans_brelse+0xc8/0x210 [xfs]
        xfs_imap_lookup+0x15c/0x1b8 [xfs]
        xfs_imap+0x18c/0x330 [xfs]
        xfs_iget+0x3a8/0x10f8 [xfs]
        xfs_mountfs+0x540/0xad0 [xfs]
        xfs_fs_fill_super+0x5b4/0x9f0 [xfs]
        get_tree_bdev_flags+0x13c/0x1e8
        get_tree_bdev+0x1c/0x30
        xfs_fs_get_tree+0x20/0x38 [xfs]
        vfs_get_tree+0x30/0x100
        path_mount+0x414/0xb88
        __arm64_sys_mount+0x258/0x338
        invoke_syscall+0x70/0x100
        el0_svc_common.constprop.0+0xc8/0xf0
        do_el0_svc+0x24/0x38
        el0_svc+0x50/0x1b8
        el0t_64_sync_handler+0x10c/0x138
        el0t_64_sync+0x19c/0x1a0

-> #0 (&bp->b_lock){+.+.}-{3:3}:
        __lock_acquire+0x12ec/0x1ee8
        lock_acquire+0x1ac/0x368
        _raw_spin_lock+0x50/0x70
        xfs_buf_get_map+0x7b4/0x1108 [xfs]
        xfs_buf_read_map+0x54/0x2f8 [xfs]
        xfs_trans_read_buf_map+0x1cc/0x510 [xfs]
        xfs_imap_to_bp+0x5c/0xc8 [xfs]
        xfs_iget+0x3dc/0x10f8 [xfs]
        xfs_lookup+0xf4/0x208 [xfs]
        xfs_vn_lookup+0x5c/0x98 [xfs]
        __lookup_slow+0xb4/0x168
        walk_component+0xe0/0x1a0
        path_lookupat+0x80/0x1b0
        filename_lookup+0xb0/0x1b8
        vfs_statx+0x74/0xd8
        vfs_fstatat+0x6c/0xe0
        __do_sys_newfstatat+0x48/0x78
        __arm64_sys_newfstatat+0x28/0x38
        invoke_syscall+0x70/0x100
        el0_svc_common.constprop.0+0x48/0xf0
        do_el0_svc+0x24/0x38
        el0_svc+0x50/0x1b8
        el0t_64_sync_handler+0x10c/0x138
        el0t_64_sync+0x19c/0x1a0

other info that might help us debug this:

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(&bch->bc_lock);
                                lock(&bp->b_lock);
                                lock(&bch->bc_lock);
   lock(&bp->b_lock);

  *** DEADLOCK ***


Seems reproducible. Kernel was at 9c5968db9e625019a0ee5226c7eebef5519d366a
plus some unrelated patches.

Sebastian