[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAADnVQKPEPAt4rZiZ33WPpcZA9qiM5hogC4AyCN1r5d_B4Debw@mail.gmail.com>
Date: Tue, 14 Jan 2025 17:36:44 -0800
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Hou Tao <houtao@...weicloud.com>
Cc: bpf <bpf@...r.kernel.org>, Network Development <netdev@...r.kernel.org>,
Martin KaFai Lau <martin.lau@...ux.dev>, Andrii Nakryiko <andrii@...nel.org>,
Eduard Zingerman <eddyz87@...il.com>, Song Liu <song@...nel.org>, Hao Luo <haoluo@...gle.com>,
Yonghong Song <yonghong.song@...ux.dev>, Daniel Borkmann <daniel@...earbox.net>,
KP Singh <kpsingh@...nel.org>, Stanislav Fomichev <sdf@...ichev.me>, Jiri Olsa <jolsa@...nel.org>,
John Fastabend <john.fastabend@...il.com>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>, Hou Tao <houtao1@...wei.com>,
Xu Kuohai <xukuohai@...wei.com>
Subject: Re: [PATCH bpf-next v2 4/5] bpf: Cancel the running bpf_timer through kworker
On Wed, Jan 8, 2025 at 10:07 PM Hou Tao <houtao@...weicloud.com> wrote:
>
> From: Hou Tao <houtao1@...wei.com>
>
> During the update procedure, when overwrite element in a pre-allocated
> htab, the freeing of old_element is protected by the bucket lock. The
> reason why the bucket lock is necessary is that the old_element has
> already been stashed in htab->extra_elems after alloc_htab_elem()
> returns. If freeing the old_element after the bucket lock is unlocked,
> the stashed element may be reused by concurrent update procedure and the
> freeing of old_element will run concurrently with the reuse of the
> old_element. However, the invocation of check_and_free_fields() may
> acquire a spin-lock which violates the lockdep rule because its caller
> has already held a raw-spin-lock (bucket lock). The following warning
> will be reported when such race happens:
>
> BUG: scheduling while atomic: test_progs/676/0x00000003
> 3 locks held by test_progs/676:
> #0: ffffffff864b0240 (rcu_read_lock_trace){....}-{0:0}, at: bpf_prog_test_run_syscall+0x2c0/0x830
> #1: ffff88810e961188 (&htab->lockdep_key){....}-{2:2}, at: htab_map_update_elem+0x306/0x1500
> #2: ffff8881f4eac1b8 (&base->softirq_expiry_lock){....}-{2:2}, at: hrtimer_cancel_wait_running+0xe9/0x1b0
> Modules linked in: bpf_testmod(O)
> Preemption disabled at:
> [<ffffffff817837a3>] htab_map_update_elem+0x293/0x1500
> CPU: 0 UID: 0 PID: 676 Comm: test_progs Tainted: G ... 6.12.0+ #11
> Tainted: [W]=WARN, [O]=OOT_MODULE
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)...
> Call Trace:
> <TASK>
> dump_stack_lvl+0x57/0x70
> dump_stack+0x10/0x20
> __schedule_bug+0x120/0x170
> __schedule+0x300c/0x4800
> schedule_rtlock+0x37/0x60
> rtlock_slowlock_locked+0x6d9/0x54c0
> rt_spin_lock+0x168/0x230
> hrtimer_cancel_wait_running+0xe9/0x1b0
Since this issue is limited to RT
#ifdef CONFIG_PREEMPT_RT
void hrtimer_cancel_wait_running(const struct hrtimer *timer);
#else
static inline void hrtimer_cancel_wait_running(struct hrtimer *timer)
{
cpu_relax();
}
#endif
let's avoid overloading system_unbound_wq in !RT.
> - if (this_cpu_read(hrtimer_running))
> - queue_work(system_unbound_wq, &t->cb.delete_work);
> - else
> - bpf_timer_delete_work(&t->cb.delete_work);
keep this sync call on !RT,
> + if (!this_cpu_read(hrtimer_running) && hrtimer_try_to_cancel(&t->timer) >= 0) {
> + kfree_rcu(t, cb.rcu);
> + return;
> + }
> +
> + /* The timer is running on current or other CPU. Use a kworker to wait
> + * for the completion of the timer instead of spinning on current CPU
> + * or trying to acquire a sleepable lock to wait for its completion.
> + */
> + queue_work(system_unbound_wq, &t->cb.delete_work);
overloading system_unbound_wq even in !RT when hrtimer_running
is an issue to solve as well,
but let's not double down on the problem to avoid making things worse.
The rest looks good.
pw-bot: cr
Powered by blists - more mailing lists