[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAN-5tyEtZbD8TDMbyxutf_LCT1-aoG_BUF2gjBiMJ0HG9eLMMg@mail.gmail.com>
Date: Tue, 5 Nov 2024 17:08:35 -0500
From: Olga Kornievskaia <aglo@...ch.edu>
To: cel@...nel.org, Jeff Layton <jlayton@...nel.org>
Cc: Neil Brown <neilb@...e.de>, Dai Ngo <Dai.Ngo@...cle.com>, Tom Talpey <tom@...pey.com>,
Chuck Lever <chuck.lever@...cle.com>, Olga Kornievskaia <okorniev@...hat.com>, linux-nfs@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 0/2] nfsd: allow the use of multiple backchannel slots
Hi Jeff/Chuck,
Hitting the following softlockup when running using nfsd-next code.
testing is same open bunch of file get delegations, do local
conflicting operation. Network trace shows a few cb_recalls occurring
successfully before the soft lockup (I can confirm that more than 1
slot was used. But I also see that the server isn't trying to use the
lowest available slot but instead just bumps the number and uses the
next one. By that I mean, say slot 0 was used and a reply came back
but the next callback would use slot 1 instead of slot 0).
[ 344.045843] watchdog: BUG: soft lockup - CPU#0 stuck for 26s!
[kworker/u24:28:205]
[ 344.047669] Modules linked in: rpcrdma rdma_cm iw_cm ib_cm ib_core
nfsd auth_rpcgss nfs_acl lockd grace uinput isofs snd_seq_dummy
snd_hrtimer vsock_loopback vmw_vsock_virtio_transport_common qrtr
rfkill vmw_vsock_vmci_transport vsock sunrpc vfat fat uvcvideo
snd_hda_codec_generic snd_hda_intel videobuf2_vmalloc snd_intel_dspcfg
videobuf2_memops uvc snd_hda_codec videobuf2_v4l2 snd_hda_core
snd_hwdep videodev snd_seq snd_seq_device videobuf2_common snd_pcm mc
snd_timer snd vmw_vmci soundcore xfs libcrc32c vmwgfx nvme
drm_ttm_helper ttm crct10dif_ce ghash_ce sha2_ce sha256_arm64
drm_kms_helper nvme_core sha1_ce sr_mod e1000e nvme_auth cdrom drm sg
fuse
[ 344.050421] CPU: 0 UID: 0 PID: 205 Comm: kworker/u24:28 Kdump:
loaded Not tainted 6.12.0-rc4+ #42
[ 344.050821] Hardware name: VMware, Inc. VMware20,1/VBSA, BIOS
VMW201.00V.21805430.BA64.2305221830 05/22/2023
[ 344.051248] Workqueue: rpciod rpc_async_schedule [sunrpc]
[ 344.051513] pstate: 11400005 (nzcV daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[ 344.051821] pc : kasan_check_range+0x0/0x188
[ 344.052011] lr : __kasan_check_write+0x1c/0x28
[ 344.052208] sp : ffff800087027920
[ 344.052352] x29: ffff800087027920 x28: 0000000000040000 x27: ffff0000a520f170
[ 344.052710] x26: 0000000000000000 x25: 1fffe00014a41e2e x24: ffff0002841692c0
[ 344.053159] x23: ffff0002841692c8 x22: 0000000000000000 x21: 1ffff00010e04f2a
[ 344.053612] x20: ffff0002841692c0 x19: ffff80008318c2c0 x18: 0000000000000000
[ 344.054054] x17: 0000006800000000 x16: 1fffe0000010fd60 x15: 0a0d37303736205d
[ 344.054501] x14: 3136335b0a0d3630 x13: 1ffff000104751c9 x12: ffff600014a41e2f
[ 344.054952] x11: 1fffe00014a41e2e x10: ffff600014a41e2e x9 : dfff800000000000
[ 344.055402] x8 : 00009fffeb5be1d2 x7 : ffff0000a520f173 x6 : 0000000000000001
[ 344.055735] x5 : ffff0000a520f170 x4 : 0000000000000000 x3 : ffff8000823129fc
[ 344.056058] x2 : 0000000000000001 x1 : 0000000000000002 x0 : ffff0000a520f172
[ 344.056479] Call trace:
[ 344.056636] kasan_check_range+0x0/0x188
[ 344.056886] queued_spin_lock_slowpath+0x5f4/0xaa0
[ 344.057192] _raw_spin_lock+0x180/0x1a8
[ 344.057436] rpc_sleep_on+0x78/0xe8 [sunrpc]
[ 344.057700] nfsd4_cb_prepare+0x15c/0x468 [nfsd]
[ 344.057935] rpc_prepare_task+0x70/0xa0 [sunrpc]
[ 344.058165] __rpc_execute+0x1e8/0xa48 [sunrpc]
[ 344.058388] rpc_async_schedule+0x90/0x100 [sunrpc]
[ 344.058623] process_one_work+0x598/0x1100
[ 344.058818] worker_thread+0x6c0/0xa58
[ 344.058992] kthread+0x288/0x310
[ 344.059145] ret_from_fork+0x10/0x20
[ 344.075846] watchdog: BUG: soft lockup - CPU#1 stuck for 26s!
[kworker/u24:27:204]
[ 344.076295] Modules linked in: rpcrdma rdma_cm iw_cm ib_cm ib_core
nfsd auth_rpcgss nfs_acl lockd grace uinput isofs snd_seq_dummy
snd_hrtimer vsock_loopback vmw_vsock_virtio_transport_common qrtr
rfkill vmw_vsock_vmci_transport vsock sunrpc vfat fat uvcvideo
snd_hda_codec_generic snd_hda_intel videobuf2_vmalloc snd_intel_dspcfg
videobuf2_memops uvc snd_hda_codec videobuf2_v4l2 snd_hda_core
snd_hwdep videodev snd_seq snd_seq_device videobuf2_common snd_pcm mc
snd_timer snd vmw_vmci soundcore xfs libcrc32c vmwgfx nvme
drm_ttm_helper ttm crct10dif_ce ghash_ce sha2_ce sha256_arm64
drm_kms_helper nvme_core sha1_ce sr_mod e1000e nvme_auth cdrom drm sg
fuse
[ 344.079648] CPU: 1 UID: 0 PID: 204 Comm: kworker/u24:27 Kdump:
loaded Tainted: G L 6.12.0-rc4+ #42
[ 344.080290] Tainted: [L]=SOFTLOCKUP
[ 344.080495] Hardware name: VMware, Inc. VMware20,1/VBSA, BIOS
VMW201.00V.21805430.BA64.2305221830 05/22/2023
[ 344.080930] Workqueue: rpciod rpc_async_schedule [sunrpc]
[ 344.081212] pstate: 21400005 (nzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[ 344.081630] pc : _raw_spin_lock+0x108/0x1a8
[ 344.081815] lr : _raw_spin_lock+0xf4/0x1a8
[ 344.081998] sp : ffff800087017a30
[ 344.082146] x29: ffff800087017a90 x28: ffff0000a520f170 x27: ffff6000148a1081
[ 344.082467] x26: 1fffe000148a1081 x25: ffff0000a450840c x24: ffff0000a520ed40
[ 344.082892] x23: ffff0000a4508404 x22: ffff0002e9028000 x21: ffff800087017a50
[ 344.083338] x20: 1ffff00010e02f46 x19: ffff0000a520f170 x18: 0000000000000000
[ 344.083775] x17: 0000000000000000 x16: 0000000000000000 x15: 0000aaab024bdd10
[ 344.084217] x14: 0000000000000000 x13: 0000000000000000 x12: ffff700010e02f4b
[ 344.084625] x11: 1ffff00010e02f4a x10: ffff700010e02f4a x9 : dfff800000000000
[ 344.084945] x8 : 0000000000000004 x7 : 0000000000000003 x6 : 0000000000000001
[ 344.085264] x5 : ffff800087017a50 x4 : ffff700010e02f4a x3 : ffff800082311154
[ 344.085587] x2 : 0000000000000001 x1 : 0000000000000000 x0 : 0000000000000000
[ 344.085915] Call trace:
[ 344.086028] _raw_spin_lock+0x108/0x1a8
[ 344.086210] rpc_wake_up_queued_task+0x5c/0xf8 [sunrpc]
[ 344.086465] nfsd4_cb_prepare+0x168/0x468 [nfsd]
[ 344.086694] rpc_prepare_task+0x70/0xa0 [sunrpc]
[ 344.086922] __rpc_execute+0x1e8/0xa48 [sunrpc]
[ 344.087148] rpc_async_schedule+0x90/0x100 [sunrpc]
[ 344.087389] process_one_work+0x598/0x1100
[ 344.087584] worker_thread+0x6c0/0xa58
[ 344.087758] kthread+0x288/0x310
[ 344.087909] ret_from_fork+0x10/0x20
On Wed, Oct 30, 2024 at 4:30 PM <cel@...nel.org> wrote:
>
> From: Chuck Lever <chuck.lever@...cle.com>
>
> On Wed, 30 Oct 2024 10:48:45 -0400, Jeff Layton wrote:
> > A few more minor updates to the set to fix some small-ish bugs, and do a
> > bit of cleanup. This seems to test OK for me so far.
> >
> >
>
> Applied to nfsd-next for v6.13, thanks! Still open for comments and
> test results.
>
> [1/2] nfsd: make nfsd4_session->se_flags a bool
> commit: d10f8b7deb4e8a3a0c75855fdad7aae9c1943816
> [2/2] nfsd: allow for up to 32 callback session slots
> commit: 6c8910ac1cd360ea01136d707158690b5159a1d0
>
> --
> Chuck Lever
>
>
Powered by blists - more mailing lists