[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241202-solemn-beneficial-dinosaur-7e3e4e@leitao>
Date: Mon, 2 Dec 2024 12:16:01 -0800
From: Breno Leitao <leitao@...ian.org>
To: Herbert Xu <herbert@...dor.apana.org.au>
Cc: Andrew Morton <akpm@...ux-foundation.org>, Thomas Graf <tgraf@...g.ch>,
Tejun Heo <tj@...nel.org>, Hao Luo <haoluo@...gle.com>,
Josh Don <joshdon@...gle.com>, Barret Rhoden <brho@...gle.com>,
netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] rhashtable: Fix potential deadlock by moving
schedule_work outside lock
Hello Herbert,
On Fri, Nov 29, 2024 at 01:47:42PM +0800, Herbert Xu wrote:
> On Thu, Nov 28, 2024 at 04:16:25AM -0800, Breno Leitao wrote:
> > Move the hash table growth check and work scheduling outside the
> > rht lock to prevent a possible circular locking dependency.
> >
> > The original implementation could trigger a lockdep warning due to
> > a potential deadlock scenario involving nested locks between
> > rhashtable bucket, rq lock, and dsq lock. By relocating the
> > growth check and work scheduling after releasing the rth lock, we break
> > this potential deadlock chain.
> >
> > This change expands the flexibility of rhashtable by removing
> > restrictive locking that previously limited its use in scheduler
> > and workqueue contexts.
>
> Could you please explain the deadlock?
I understand that there is a locking order inversion here:
A possible code flow can hold the rhashtable_bucket, and then can get
the dqs->lock, as shown int eh following snippet:
Chain exists of:
rhashtable_bucket --> &rq->__lock --> &dsq->lock
The same is true, when sched_ext holds rthe dsq->lock and try to get
hold of rhashtable lock.
This could be seen in the following snippers:
rht_lock+0x69/0xd0
destroy_dsq+0x22d/0x790
scx_ops_disable_workfn+0x9d2/0xaf0
> Is the workqueue system actually using rhashtable?
It seems so when using sched_ext scheduler class. For instance, lockdep
got it in scx_ops_disable_workfn().
rht_lock+0x69/0xd0
destroy_dsq+0x22d/0x790
scx_ops_disable_workfn+0x9d2/0xaf0
kthread_worker_fn+0x137/0x350
This is the full lockdep splat, if it helps. Sorry it is not decoded,
but this can give you the code-flow.
======================================================
WARNING: possible circular locking dependency detected
hardirqs last enabled at (2088145): [<ffffffff822ab674>] _raw_spin_unlock_irq+0x24/0x50
hardirqs last disabled at (2088144): [<ffffffff822ab4bf>] _raw_spin_lock_irq+0x2f/0x80
------------------------------------------------------
softirqs last enabled at (2088116): [<ffffffff810f7294>] __irq_exit_rcu+0x74/0x100
sched_ext_ops_h/10451 is trying to acquire lock:
softirqs last disabled at (2088111): [<ffffffff810f7294>] __irq_exit_rcu+0x74/0x100
ffff888288059038 (rhashtable_bucket){....}-{0:0}, at: rht_lock+0x51/0xd0
but task is already holding lock:
ffff888470645698 (&dsq->lock){-.-.}-{2:2}, at: destroy_dsq+0xaf/0x790
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #4 (&dsq->lock){-.-.}-{2:2}:
_raw_spin_lock+0x2f/0x40
dispatch_enqueue+0x7c/0x3e0
enqueue_task_scx.llvm.3416789782249720787+0x1ae/0x250
sched_enq_and_set_task+0x5f/0xb0
bpf_scx_reg+0xf21/0x11b0
bpf_struct_ops_link_create+0xec/0x160
__sys_bpf+0x34d/0x3b0
__x64_sys_bpf+0x18/0x20
do_syscall_64+0x7e/0x150
entry_SYSCALL_64_after_hwframe+0x4b/0x53
-> #3 (&rq->__lock){-.-.}-{2:2}:
_raw_spin_lock_nested+0x32/0x40
raw_spin_rq_lock_nested+0x20/0x30
task_fork_fair.llvm.5382994275699419189+0x3b/0x110
sched_cgroup_fork+0xe3/0x100
copy_process+0xc3c/0x14a0
kernel_clone+0x90/0x360
user_mode_thread+0xbc/0xe0
rest_init+0x1f/0x1f0
start_kernel+0x41b/0x470
x86_64_start_reservations+0x26/0x30
x86_64_start_kernel+0x9b/0xa0
common_startup_64+0x13e/0x140
-> #2 (&p->pi_lock){-.-.}-{2:2}:
_raw_spin_lock_irqsave+0x5a/0x90
try_to_wake_up+0x58/0x730
create_worker+0x1d6/0x240
workqueue_init+0x2c0/0x390
kernel_init_freeable+0x147/0x200
kernel_init+0x16/0x1c0
ret_from_fork+0x2f/0x40
ret_from_fork_asm+0x11/0x20
-> #1 (&pool->lock){-.-.}-{2:2}:
_raw_spin_lock+0x2f/0x40
__queue_work+0x24b/0x610
queue_work_on+0xa5/0xf0
rhashtable_insert_slow+0x524/0x970
__xdp_reg_mem_model+0x181/0x240
xdp_rxq_info_reg_mem_model+0x19/0xf0
bnxt_alloc_mem+0x1178/0x1c80
__bnxt_open_nic+0x1bb/0xe20
bnxt_open_nic+0x26/0x60
ethtool_set_channels+0x1b7/0x1f0
dev_ethtool+0x555/0x740
dev_ioctl+0x2ac/0x3f0
sock_do_ioctl+0x111/0x180
sock_ioctl+0x1fb/0x2e0
__se_sys_ioctl+0x72/0xc0
do_syscall_64+0x7e/0x150
entry_SYSCALL_64_after_hwframe+0x4b/0x53
-> #0 (rhashtable_bucket){....}-{0:0}:
__lock_acquire+0x1742/0x3470
lock_acquire+0xf0/0x290
rht_lock+0x69/0xd0
destroy_dsq+0x22d/0x790
scx_ops_disable_workfn+0x9d2/0xaf0
kthread_worker_fn+0x137/0x350
kthread+0x102/0x120
ret_from_fork+0x2f/0x40
ret_from_fork_asm+0x11/0x20
other info that might help us debug this:
Chain exists of:
rhashtable_bucket --> &rq->__lock --> &dsq->lock
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&dsq->lock);
lock(&rq->__lock);
lock(&dsq->lock);
lock(rhashtable_bucket);
*** DEADLOCK ***
5 locks held by sched_ext_ops_h/10451:
#0: ffffffff83695ad0 (scx_ops_enable_mutex){+.+.}-{3:3}, at: scx_ops_disable_workfn+0x111/0xaf0
#1: ffffffff83da13d8 (rcu_read_lock){....}-{1:2}, at: rhashtable_walk_start_check+0x1f/0x3e0
#2: ffffffff83da13d8 (rcu_read_lock){....}-{1:2}, at: destroy_dsq+0x30/0x790
#3: ffff888470645698 (&dsq->lock){-.-.}-{2:2}, at: destroy_dsq+0xaf/0x790
#4: ffffffff83da13d8 (rcu_read_lock){....}-{1:2}, at: destroy_dsq+0x30/0x790
Powered by blists - more mailing lists