[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <200e5b49-5c51-bbe5-de93-c6bd6339bb7f@nvidia.com>
Date: Tue, 11 Aug 2020 04:17:03 -0700
From: John Hubbard <jhubbard@...dia.com>
To: <"Chris Mason <clm"@fb.com>, Josef Bacik <josef@...icpanda.com>,
David Sterba <dsterba@...e.com>, <linux-btrfs@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>
CC: linux-fsdevel <linux-fsdevel@...r.kernel.org>
Subject: btrfs crash in kobject_del while running xfstest
Hi,
Here's an early warning of a possible problem.
I'm seeing a new btrfs crash when running xfstests, as of
00e4db51259a5f936fec1424b884f029479d3981 ("Merge tag
'perf-tools-2020-08-10' of
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux") in linux.git.
This doesn't crash in v5.8, so I attempted to bisect, but ended up with
the net-next merge commit as the offending one: commit
47ec5303d73ea344e84f46660fff693c57641386 ("Merge
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next"), which
doesn't really help because it's 2088 files changed, of course.
I'm attaching the .config that I used.
This is easily reproducible via something like (change to match your setup,
of course):
sudo TEST_DEV=/dev/nvme0n1p8 TEST_DIR=/xfstest_btrfs \
SCRATCH_DEV=/dev/nvme0n1p9 SCRATCH_MNT=/xfstest_scratch ./check \
btrfs/002
which leads to:
[ 586.097360] BTRFS info (device nvme0n1p8): disk space caching is enabled
[ 586.103232] BTRFS info (device nvme0n1p8): has skinny extents
[ 586.115169] BTRFS info (device nvme0n1p8): enabling ssd optimizations
[ 586.308264] BTRFS: device fsid 5dfff89d-8f8d-42ac-8538-acb95164d0be devid 1 transid 5
/dev/nvme0n1p9 scanned by mkfs.btrfs (6374)
[ 586.342776] BTRFS info (device nvme0n1p9): disk space caching is enabled
[ 586.348585] BTRFS info (device nvme0n1p9): has skinny extents
[ 586.353413] BTRFS info (device nvme0n1p9): flagging fs with big metadata feature
[ 586.368129] BTRFS info (device nvme0n1p9): enabling ssd optimizations
[ 586.373996] BTRFS info (device nvme0n1p9): checking UUID tree
[ 586.387449] BUG: kernel NULL pointer dereference, address: 0000000000000018
[ 586.393485] #PF: supervisor read access in kernel mode
[ 586.397623] #PF: error_code(0x0000) - not-present page
[ 586.401763] PGD 0 P4D 0
[ 586.403219] Oops: 0000 [#1] SMP PTI
[ 586.405650] CPU: 1 PID: 6405 Comm: umount Not tainted 5.8.0-hubbard-github+ #171
[ 586.412118] Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./X99-UD3P-CF, BIOS
F1 02/10/2015
[ 586.421360] RIP: 0010:kobject_del+0x1/0x20
[ 586.424427] Code: 48 c7 43 18 00 00 00 00 5b 5d c3 c3 be 01 00 00 00 48 89 df e8 60 1b 00 00 eb
c9 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 55 <48> 8b 6f 18 e8 86 ff ff ff 48 89 ef 5d e9 cd fe ff
ff 66 66 2e 0f
[ 586.442644] RSP: 0018:ffffc90009ef7e08 EFLAGS: 00010246
[ 586.446914] RAX: 0000000000000000 RBX: ffff888896080000 RCX: 0000000000000006
[ 586.453149] RDX: ffff88888ee4b000 RSI: ffffffff82669a00 RDI: 0000000000000000
[ 586.459390] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
[ 586.465631] R10: 0000000000000001 R11: 0000000000000000 R12: ffff888896080000
[ 586.471866] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 586.478106] FS: 00007f5595739c80(0000) GS:ffff88889fc40000(0000) knlGS:0000000000000000
[ 586.485325] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 586.490129] CR2: 0000000000000018 CR3: 0000000896d5a006 CR4: 00000000001706e0
[ 586.496372] Call Trace:
[ 586.497807] btrfs_sysfs_del_qgroups+0xa5/0xe0 [btrfs]
[ 586.502017] close_ctree+0x1c5/0x2b6 [btrfs]
[ 586.505307] ? fsnotify_destroy_marks+0x24/0x124
[ 586.508948] generic_shutdown_super+0x67/0x100
[ 586.512408] kill_anon_super+0x14/0x30
[ 586.515159] btrfs_kill_super+0x12/0x20 [btrfs]
[ 586.518704] deactivate_locked_super+0x36/0x90
[ 586.522159] cleanup_mnt+0x12d/0x190
[ 586.524720] task_work_run+0x5c/0xa0
[ 586.527285] exit_to_user_mode_loop+0xb9/0xc0
[ 586.530648] exit_to_user_mode_prepare+0xab/0xe0
[ 586.534276] syscall_exit_to_user_mode+0x17/0x50
[ 586.537908] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 586.541984] RIP: 0033:0x7f55959896fb
[ 586.544531] Code: 07 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 90 f3 0f 1e fa 31 f6 e9 05 00 00 00
0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 5d 07 0c 00 f7
d8 64 89 01 48
[ 586.562775] RSP: 002b:00007fffcc431228 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[ 586.569485] RAX: 0000000000000000 RBX: 00007f5595ab31e4 RCX: 00007f55959896fb
[ 586.575753] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00005601fb16bb80
[ 586.582020] RBP: 00005601fb16b970 R08: 0000000000000000 R09: 00007fffcc42ffa0
[ 586.588278] R10: 00005601fb16c930 R11: 0000000000000246 R12: 00005601fb16bb80
[ 586.594534] R13: 0000000000000000 R14: 00005601fb16ba68 R15: 0000000000000000
[ 586.600805] Modules linked in: xfs rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace
fscache bpfilter dm_mirror dm_region_hash dm_log dm_mod iTCO_wdt iTCO_vendor_support
x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul btrfs ghash_clmulni_intel aesni_intel
blake2b_generic crypto_simd xor cryptd zstd_compress glue_helper input_leds raid6_pq libcrc32c
lpc_ich i2c_i801 mfd_core mei_me i2c_smbus mei rpcrdma sunrpc ib_isert iscsi_target_mod ib_iser
libiscsi ib_srpt target_core_mod ib_srp ib_ipoib rdma_ucm ib_uverbs ib_umad sr_mod cdrom sd_mod
nouveau ahci libahci nvme crc32c_intel video e1000e led_class nvme_core libata t10_pi ttm mxm_wmi
wmi fuse
[ 586.661098] CR2: 0000000000000018
[ 586.663455] ---[ end trace 158f42d646f4715d ]---
A quick peek shows that this is crashing here:
void kobject_del(struct kobject *kobj)
{
struct kobject *parent = kobj->parent; <---- CRASHES HERE with NULL kobj
__kobject_del(kobj);
kobject_put(parent);
}
EXPORT_SYMBOL(kobject_del);
The crash at 0x18 matches passes in a null, because that's the right offset for
->parent, and the disassembly confirms that 0x18 gets offset right at kobject_del+0x1:
Dump of assembler code for function kobject_del:
0xffffffff81534ec0 <+0>: push %rbp
0xffffffff81534ec1 <+1>: mov 0x18(%rdi),%rbp
0xffffffff81534ec5 <+5>: callq 0xffffffff81534e50 <__kobject_del>
0xffffffff81534eca <+10>: mov %rbp,%rdi
0xffffffff81534ecd <+13>: pop %rbp
0xffffffff81534ece <+14>: jmpq 0xffffffff81534da0 <kobject_put>
End of assembler dump.
But as for how we ended up with a null kobj here, that's actually hard to see, at least
for a non-btrfs person, which is why I hoped git bisect would help more than it did here.
thanks,
--
John Hubbard
NVIDIA
View attachment "btrfs_crash.config" of type "text/plain" (140389 bytes)
Powered by blists - more mailing lists