netdev - Re: [PATCH bpf-next v2 4/5] bpf: Cancel the running bpf

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CAADnVQKPEPAt4rZiZ33WPpcZA9qiM5hogC4AyCN1r5d_B4Debw@mail.gmail.com>
Date: Tue, 14 Jan 2025 17:36:44 -0800
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Hou Tao <houtao@...weicloud.com>
Cc: bpf <bpf@...r.kernel.org>, Network Development <netdev@...r.kernel.org>, 
	Martin KaFai Lau <martin.lau@...ux.dev>, Andrii Nakryiko <andrii@...nel.org>, 
	Eduard Zingerman <eddyz87@...il.com>, Song Liu <song@...nel.org>, Hao Luo <haoluo@...gle.com>, 
	Yonghong Song <yonghong.song@...ux.dev>, Daniel Borkmann <daniel@...earbox.net>, 
	KP Singh <kpsingh@...nel.org>, Stanislav Fomichev <sdf@...ichev.me>, Jiri Olsa <jolsa@...nel.org>, 
	John Fastabend <john.fastabend@...il.com>, 
	Sebastian Andrzej Siewior <bigeasy@...utronix.de>, Hou Tao <houtao1@...wei.com>, 
	Xu Kuohai <xukuohai@...wei.com>
Subject: Re: [PATCH bpf-next v2 4/5] bpf: Cancel the running bpf_timer through kworker

On Wed, Jan 8, 2025 at 10:07 PM Hou Tao <houtao@...weicloud.com> wrote:
>
> From: Hou Tao <houtao1@...wei.com>
>
> During the update procedure, when overwrite element in a pre-allocated
> htab, the freeing of old_element is protected by the bucket lock. The
> reason why the bucket lock is necessary is that the old_element has
> already been stashed in htab->extra_elems after alloc_htab_elem()
> returns. If freeing the old_element after the bucket lock is unlocked,
> the stashed element may be reused by concurrent update procedure and the
> freeing of old_element will run concurrently with the reuse of the
> old_element. However, the invocation of check_and_free_fields() may
> acquire a spin-lock which violates the lockdep rule because its caller
> has already held a raw-spin-lock (bucket lock). The following warning
> will be reported when such race happens:
>
>   BUG: scheduling while atomic: test_progs/676/0x00000003
>   3 locks held by test_progs/676:
>   #0: ffffffff864b0240 (rcu_read_lock_trace){....}-{0:0}, at: bpf_prog_test_run_syscall+0x2c0/0x830
>   #1: ffff88810e961188 (&htab->lockdep_key){....}-{2:2}, at: htab_map_update_elem+0x306/0x1500
>   #2: ffff8881f4eac1b8 (&base->softirq_expiry_lock){....}-{2:2}, at: hrtimer_cancel_wait_running+0xe9/0x1b0
>   Modules linked in: bpf_testmod(O)
>   Preemption disabled at:
>   [<ffffffff817837a3>] htab_map_update_elem+0x293/0x1500
>   CPU: 0 UID: 0 PID: 676 Comm: test_progs Tainted: G ... 6.12.0+ #11
>   Tainted: [W]=WARN, [O]=OOT_MODULE
>   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)...
>   Call Trace:
>   <TASK>
>   dump_stack_lvl+0x57/0x70
>   dump_stack+0x10/0x20
>   __schedule_bug+0x120/0x170
>   __schedule+0x300c/0x4800
>   schedule_rtlock+0x37/0x60
>   rtlock_slowlock_locked+0x6d9/0x54c0
>   rt_spin_lock+0x168/0x230
>   hrtimer_cancel_wait_running+0xe9/0x1b0

Since this issue is limited to RT

#ifdef CONFIG_PREEMPT_RT
void hrtimer_cancel_wait_running(const struct hrtimer *timer);
#else
static inline void hrtimer_cancel_wait_running(struct hrtimer *timer)
{
        cpu_relax();
}
#endif

let's avoid overloading system_unbound_wq in !RT.


> -       if (this_cpu_read(hrtimer_running))
> -               queue_work(system_unbound_wq, &t->cb.delete_work);
> -       else
> -               bpf_timer_delete_work(&t->cb.delete_work);

keep this sync call on !RT,

> +       if (!this_cpu_read(hrtimer_running) && hrtimer_try_to_cancel(&t->timer) >= 0) {
> +               kfree_rcu(t, cb.rcu);
> +               return;
> +       }
> +
> +       /* The timer is running on current or other CPU. Use a kworker to wait
> +        * for the completion of the timer instead of spinning on current CPU
> +        * or trying to acquire a sleepable lock to wait for its completion.
> +        */
> +       queue_work(system_unbound_wq, &t->cb.delete_work);

overloading system_unbound_wq even in !RT when hrtimer_running
is an issue to solve as well,
but let's not double down on the problem to avoid making things worse.

The rest looks good.
pw-bot: cr