[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <0bb955784824863853d84e4b2fa96c81e3a0dd034b1705bbbbc452f8e4dd27b2@mail.kernel.org>
Date: Mon, 27 Oct 2025 23:48:07 +0000 (UTC)
From: bot+bpf-ci@...nel.org
To: roman.gushchin@...ux.dev,akpm@...ux-foundation.org
Cc: linux-kernel@...r.kernel.org,ast@...nel.org,surenb@...gle.com,mhocko@...nel.org,shakeel.butt@...ux.dev,hannes@...xchg.org,andrii@...nel.org,inwardvessel@...il.com,linux-mm@...ck.org,cgroups@...r.kernel.org,bpf@...r.kernel.org,martin.lau@...nel.org,song@...nel.org,memxor@...il.com,tj@...nel.org,roman.gushchin@...ux.dev,ast@...nel.org,andrii@...nel.org,daniel@...earbox.net,martin.lau@...nel.org,eddyz87@...il.com,yonghong.song@...ux.dev,clm@...a.com,ihor.solodrai@...ux.dev
Subject: Re: [PATCH v2 20/23] sched: psi: implement bpf_psi struct ops
```
commit 264db34cd79e7c07b4b1e58e2efae3d632df9cb2
Author: Roman Gushchin <roman.gushchin@...ux.dev>
sched: psi: implement bpf_psi struct ops
This patch implements a BPF struct ops-based mechanism to create PSI
triggers, attach them to cgroups or system wide and handle PSI events
in BPF.
> diff --git a/kernel/sched/bpf_psi.c b/kernel/sched/bpf_psi.c
> new file mode 100644
> index 000000000..c383a2011
> --- /dev/null
> +++ b/kernel/sched/bpf_psi.c
[ ... ]
> +void bpf_psi_cgroup_offline(struct cgroup *cgroup)
> +{
> + struct psi_group *group = cgroup->psi;
> + u64 cgrp_id = cgroup_id(cgroup);
> + struct psi_trigger *t, *p;
> + struct bpf_psi *bpf_psi;
> + LIST_HEAD(to_destroy);
> + int idx;
> +
> + if (!group)
> + return;
> +
> + spin_lock(&group->bpf_triggers_lock);
> + list_for_each_entry_safe(t, p, &group->bpf_triggers, bpf_group_node) {
> + if (!t->pinned) {
> + t->pinned = true;
> + list_move(&t->bpf_group_node, &to_destroy);
> + }
> + }
> + spin_unlock(&group->bpf_triggers_lock);
> +
> + list_for_each_entry_safe(t, p, &to_destroy, bpf_group_node) {
> + bpf_psi = READ_ONCE(t->bpf_psi);
> +
> + idx = srcu_read_lock(&bpf_psi->srcu);
> + if (bpf_psi->ops->handle_cgroup_offline)
> + bpf_psi->ops->handle_cgroup_offline(bpf_psi, cgrp_id);
> + srcu_read_unlock(&bpf_psi->srcu, idx);
> +
> + spin_lock(&bpf_psi->lock);
> + list_del(&t->bpf_psi_node);
> + spin_unlock(&bpf_psi->lock);
> +
> + WRITE_ONCE(t->bpf_psi, NULL);
> + flush_workqueue(bpf_psi_wq);
> + synchronize_srcu(&bpf_psi->srcu);
> + psi_trigger_destroy(t);
> + }
> +}
Can bpf_psi_cgroup_offline() race with bpf_psi_ops_unreg() and access
freed memory?
Consider this sequence:
Thread A (bpf_psi_cgroup_offline):
- Pins triggers under group->bpf_triggers_lock
- Reads bpf_psi = READ_ONCE(t->bpf_psi) outside any lock
Thread B (bpf_psi_ops_unreg):
- Iterates bpf_psi->triggers, skips pinned triggers
- Calls synchronize_srcu(&bpf_psi->srcu)
- Calls kfree(bpf_psi)
Thread A:
- Tries srcu_read_lock(&bpf_psi->srcu) on freed memory
The pinning mechanism prevents Thread B from NULLing t->bpf_psi when
Thread A has pinned the trigger. However, Thread B unconditionally
frees the bpf_psi object itself even when it didn't process any
triggers because they were all pinned by Thread A.
After Thread A pins all triggers and releases group->bpf_triggers_lock,
Thread B can complete bpf_psi_ops_unreg(), call synchronize_srcu() (no
readers yet), and free bpf_psi. Then Thread A accesses the freed
bpf_psi via srcu_read_lock() and spin_lock().
> +static void bpf_psi_ops_unreg(void *kdata, struct bpf_link *link)
> +{
> + struct bpf_psi_ops *ops = kdata;
> + struct bpf_psi *bpf_psi = ops->bpf_psi;
> + struct psi_trigger *t, *p;
> + LIST_HEAD(to_destroy);
> +
> + spin_lock(&bpf_psi->lock);
> + list_for_each_entry_safe(t, p, &bpf_psi->triggers, bpf_psi_node) {
> + spin_lock(&t->group->bpf_triggers_lock);
> + if (!t->pinned) {
> + t->pinned = true;
> + list_move(&t->bpf_group_node, &to_destroy);
> + list_del(&t->bpf_psi_node);
> +
> + WRITE_ONCE(t->bpf_psi, NULL);
> + }
> + spin_unlock(&t->group->bpf_triggers_lock);
> + }
> + spin_unlock(&bpf_psi->lock);
> +
> + flush_workqueue(bpf_psi_wq);
> + synchronize_srcu(&bpf_psi->srcu);
> +
> + list_for_each_entry_safe(t, p, &to_destroy, bpf_group_node)
> + psi_trigger_destroy(t);
> +
> + if (!list_empty(&bpf_psi->node)) {
> + mutex_lock(&bpf_psi_lock);
> + list_del(&bpf_psi->node);
> + mutex_unlock(&bpf_psi_lock);
> + static_branch_dec(&bpf_psi_notify_key);
> + }
> +
> + cleanup_srcu_struct(&bpf_psi->srcu);
> + kfree(bpf_psi);
> +}
When bpf_psi_ops_unreg() skips all triggers because they're pinned
(by bpf_psi_cgroup_offline()), does it still free bpf_psi too early?
The to_destroy list will be empty if all triggers were pinned, but
the function still proceeds to cleanup_srcu_struct() and kfree(bpf_psi)
while bpf_psi_cgroup_offline() may be using that bpf_psi pointer.
```
---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
In-Reply-To-Subject: `sched: psi: implement bpf_psi struct ops`
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/18859027430
Powered by blists - more mailing lists