[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2a3eb48d-6ca1-61c6-20cf-ba2fbda21f45@nvidia.com>
Date: Tue, 11 Aug 2020 04:19:47 -0700
From: John Hubbard <jhubbard@...dia.com>
To: Chris Mason <clm@...com>, Josef Bacik <josef@...icpanda.com>,
David Sterba <dsterba@...e.com>, <linux-btrfs@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>
CC: linux-fsdevel <linux-fsdevel@...r.kernel.org>
Subject: Re: btrfs crash in kobject_del while running xfstest
Somehow the copy-paste of Chris Mason's name failed (user error
on my end), sorry about that Chris!
On 8/11/20 4:17 AM, John Hubbard wrote:
> Hi,
>
> Here's an early warning of a possible problem.
>
> I'm seeing a new btrfs crash when running xfstests, as of
> 00e4db51259a5f936fec1424b884f029479d3981 ("Merge tag
> 'perf-tools-2020-08-10' of
> git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux") in linux.git.
>
> This doesn't crash in v5.8, so I attempted to bisect, but ended up with
> the net-next merge commit as the offending one: commit
> 47ec5303d73ea344e84f46660fff693c57641386 ("Merge
> git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next"), which
> doesn't really help because it's 2088 files changed, of course.
>
> I'm attaching the .config that I used.
>
> This is easily reproducible via something like (change to match your setup,
> of course):
>
> sudo TEST_DEV=/dev/nvme0n1p8 TEST_DIR=/xfstest_btrfs \
> SCRATCH_DEV=/dev/nvme0n1p9 SCRATCH_MNT=/xfstest_scratch ./check \
> btrfs/002
>
> which leads to:
>
> [ 586.097360] BTRFS info (device nvme0n1p8): disk space caching is enabled
> [ 586.103232] BTRFS info (device nvme0n1p8): has skinny extents
> [ 586.115169] BTRFS info (device nvme0n1p8): enabling ssd optimizations
> [ 586.308264] BTRFS: device fsid 5dfff89d-8f8d-42ac-8538-acb95164d0be devid 1 transid 5
> /dev/nvme0n1p9 scanned by mkfs.btrfs (6374)
> [ 586.342776] BTRFS info (device nvme0n1p9): disk space caching is enabled
> [ 586.348585] BTRFS info (device nvme0n1p9): has skinny extents
> [ 586.353413] BTRFS info (device nvme0n1p9): flagging fs with big metadata feature
> [ 586.368129] BTRFS info (device nvme0n1p9): enabling ssd optimizations
> [ 586.373996] BTRFS info (device nvme0n1p9): checking UUID tree
> [ 586.387449] BUG: kernel NULL pointer dereference, address: 0000000000000018
> [ 586.393485] #PF: supervisor read access in kernel mode
> [ 586.397623] #PF: error_code(0x0000) - not-present page
> [ 586.401763] PGD 0 P4D 0
> [ 586.403219] Oops: 0000 [#1] SMP PTI
> [ 586.405650] CPU: 1 PID: 6405 Comm: umount Not tainted 5.8.0-hubbard-github+ #171
> [ 586.412118] Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./X99-UD3P-CF, BIOS
> F1 02/10/2015
> [ 586.421360] RIP: 0010:kobject_del+0x1/0x20
> [ 586.424427] Code: 48 c7 43 18 00 00 00 00 5b 5d c3 c3 be 01 00 00 00 48 89 df e8 60 1b 00 00 eb
> c9 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 55 <48> 8b 6f 18 e8 86 ff ff ff 48 89 ef 5d e9 cd fe ff
> ff 66 66 2e 0f
> [ 586.442644] RSP: 0018:ffffc90009ef7e08 EFLAGS: 00010246
> [ 586.446914] RAX: 0000000000000000 RBX: ffff888896080000 RCX: 0000000000000006
> [ 586.453149] RDX: ffff88888ee4b000 RSI: ffffffff82669a00 RDI: 0000000000000000
> [ 586.459390] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
> [ 586.465631] R10: 0000000000000001 R11: 0000000000000000 R12: ffff888896080000
> [ 586.471866] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [ 586.478106] FS: 00007f5595739c80(0000) GS:ffff88889fc40000(0000) knlGS:0000000000000000
> [ 586.485325] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 586.490129] CR2: 0000000000000018 CR3: 0000000896d5a006 CR4: 00000000001706e0
> [ 586.496372] Call Trace:
> [ 586.497807] btrfs_sysfs_del_qgroups+0xa5/0xe0 [btrfs]
> [ 586.502017] close_ctree+0x1c5/0x2b6 [btrfs]
> [ 586.505307] ? fsnotify_destroy_marks+0x24/0x124
> [ 586.508948] generic_shutdown_super+0x67/0x100
> [ 586.512408] kill_anon_super+0x14/0x30
> [ 586.515159] btrfs_kill_super+0x12/0x20 [btrfs]
> [ 586.518704] deactivate_locked_super+0x36/0x90
> [ 586.522159] cleanup_mnt+0x12d/0x190
> [ 586.524720] task_work_run+0x5c/0xa0
> [ 586.527285] exit_to_user_mode_loop+0xb9/0xc0
> [ 586.530648] exit_to_user_mode_prepare+0xab/0xe0
> [ 586.534276] syscall_exit_to_user_mode+0x17/0x50
> [ 586.537908] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 586.541984] RIP: 0033:0x7f55959896fb
> [ 586.544531] Code: 07 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 90 f3 0f 1e fa 31 f6 e9 05 00 00 00
> 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 5d 07 0c 00 f7
> d8 64 89 01 48
> [ 586.562775] RSP: 002b:00007fffcc431228 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
> [ 586.569485] RAX: 0000000000000000 RBX: 00007f5595ab31e4 RCX: 00007f55959896fb
> [ 586.575753] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00005601fb16bb80
> [ 586.582020] RBP: 00005601fb16b970 R08: 0000000000000000 R09: 00007fffcc42ffa0
> [ 586.588278] R10: 00005601fb16c930 R11: 0000000000000246 R12: 00005601fb16bb80
> [ 586.594534] R13: 0000000000000000 R14: 00005601fb16ba68 R15: 0000000000000000
> [ 586.600805] Modules linked in: xfs rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace
> fscache bpfilter dm_mirror dm_region_hash dm_log dm_mod iTCO_wdt iTCO_vendor_support
> x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul btrfs ghash_clmulni_intel aesni_intel
> blake2b_generic crypto_simd xor cryptd zstd_compress glue_helper input_leds raid6_pq libcrc32c
> lpc_ich i2c_i801 mfd_core mei_me i2c_smbus mei rpcrdma sunrpc ib_isert iscsi_target_mod ib_iser
> libiscsi ib_srpt target_core_mod ib_srp ib_ipoib rdma_ucm ib_uverbs ib_umad sr_mod cdrom sd_mod
> nouveau ahci libahci nvme crc32c_intel video e1000e led_class nvme_core libata t10_pi ttm mxm_wmi
> wmi fuse
> [ 586.661098] CR2: 0000000000000018
> [ 586.663455] ---[ end trace 158f42d646f4715d ]---
>
> A quick peek shows that this is crashing here:
>
> void kobject_del(struct kobject *kobj)
> {
> struct kobject *parent = kobj->parent; <---- CRASHES HERE with NULL kobj
>
> __kobject_del(kobj);
> kobject_put(parent);
> }
> EXPORT_SYMBOL(kobject_del);
>
> The crash at 0x18 matches passes in a null, because that's the right offset for
> ->parent, and the disassembly confirms that 0x18 gets offset right at kobject_del+0x1:
>
> Dump of assembler code for function kobject_del:
> 0xffffffff81534ec0 <+0>: push %rbp
> 0xffffffff81534ec1 <+1>: mov 0x18(%rdi),%rbp
> 0xffffffff81534ec5 <+5>: callq 0xffffffff81534e50 <__kobject_del>
> 0xffffffff81534eca <+10>: mov %rbp,%rdi
> 0xffffffff81534ecd <+13>: pop %rbp
> 0xffffffff81534ece <+14>: jmpq 0xffffffff81534da0 <kobject_put>
> End of assembler dump.
>
> But as for how we ended up with a null kobj here, that's actually hard to see, at least
> for a non-btrfs person, which is why I hoped git bisect would help more than it did here.
>
>
> thanks,
thanks,
--
John Hubbard
NVIDIA
Powered by blists - more mailing lists