[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250609225611.3967338-3-shakeel.butt@linux.dev>
Date: Mon, 9 Jun 2025 15:56:10 -0700
From: Shakeel Butt <shakeel.butt@...ux.dev>
To: Tejun Heo <tj@...nel.org>,
Andrew Morton <akpm@...ux-foundation.org>
Cc: Johannes Weiner <hannes@...xchg.org>,
Michal Hocko <mhocko@...nel.org>,
Roman Gushchin <roman.gushchin@...ux.dev>,
Muchun Song <muchun.song@...ux.dev>,
Vlastimil Babka <vbabka@...e.cz>,
Alexei Starovoitov <ast@...nel.org>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
Michal Koutný <mkoutny@...e.com>,
Harry Yoo <harry.yoo@...cle.com>,
Yosry Ahmed <yosry.ahmed@...ux.dev>,
bpf@...r.kernel.org,
linux-mm@...ck.org,
cgroups@...r.kernel.org,
linux-kernel@...r.kernel.org,
Meta kernel team <kernel-team@...a.com>
Subject: [PATCH 2/3] cgroup: make css_rstat_updated nmi safe
To make css_rstat_updated() able to safely run in nmi context, let's
move the rstat update tree creation at the flush side and use per-cpu
lockless lists in struct cgroup_subsys to track the css whose stats are
updated on that cpu.
The struct cgroup_subsys_state now has per-cpu lnode which needs to be
inserted into the corresponding per-cpu lhead of struct cgroup_subsys.
Since we want the insertion to be nmi safe, there can be multiple
inserters on the same cpu for the same lnode. The current llist does not
provide function to protect against the scenario where multiple
inserters can use the same lnode. So, using llist_node() out of the box
is not safe for this scenario.
However we can protect against multiple inserters using the same lnode
by using the fact llist node points to itself when not on the llist and
atomically reset it and select the winner as the single inserter.
Signed-off-by: Shakeel Butt <shakeel.butt@...ux.dev>
---
kernel/cgroup/rstat.c | 57 ++++++++++++++++++++++++++++++++++---------
1 file changed, 45 insertions(+), 12 deletions(-)
diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c
index a5608ae2be27..4fabd7973067 100644
--- a/kernel/cgroup/rstat.c
+++ b/kernel/cgroup/rstat.c
@@ -138,13 +138,15 @@ void _css_rstat_cpu_unlock(struct cgroup_subsys_state *css, int cpu,
* @css: target cgroup subsystem state
* @cpu: cpu on which rstat_cpu was updated
*
- * @css's rstat_cpu on @cpu was updated. Put it on the parent's matching
- * rstat_cpu->updated_children list. See the comment on top of
- * css_rstat_cpu definition for details.
+ * Atomically inserts the css in the ss's llist for the given cpu. This is nmi
+ * safe. The ss's llist will be processed at the flush time to create the update
+ * tree.
*/
__bpf_kfunc void css_rstat_updated(struct cgroup_subsys_state *css, int cpu)
{
- unsigned long flags;
+ struct llist_head *lhead = ss_lhead_cpu(css->ss, cpu);
+ struct css_rstat_cpu *rstatc = css_rstat_cpu(css, cpu);
+ struct llist_node *self;
/*
* Since bpf programs can call this function, prevent access to
@@ -153,19 +155,37 @@ __bpf_kfunc void css_rstat_updated(struct cgroup_subsys_state *css, int cpu)
if (!css_uses_rstat(css))
return;
+ lockdep_assert_preemption_disabled();
+
+ /*
+ * For arch that does not support nmi safe cmpxchg, we ignore the
+ * requests from nmi context for rstat update llist additions.
+ */
+ if (!IS_ENABLED(CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG) && in_nmi())
+ return;
+
+ /* If already on list return. */
+ if (llist_on_list(&rstatc->lnode))
+ return;
+
/*
- * Speculative already-on-list test. This may race leading to
- * temporary inaccuracies, which is fine.
+ * Make sure only one insert request can proceed on this cpu for this
+ * specific lnode and thus this needs to be safe against irqs and nmis.
*
- * Because @parent's updated_children is terminated with @parent
- * instead of NULL, we can tell whether @css is on the list by
- * testing the next pointer for NULL.
+ * Please note that llist_add() does not protect against multiple
+ * inserters for the same lnode. We use the fact that lnode points to
+ * itself when not on a list and then atomically set it to NULL to
+ * select the single inserter.
*/
- if (data_race(css_rstat_cpu(css, cpu)->updated_next))
+ self = &rstatc->lnode;
+ if (!try_cmpxchg(&(rstatc->lnode.next), &self, NULL))
return;
- flags = _css_rstat_cpu_lock(css, cpu, true);
+ llist_add(&rstatc->lnode, lhead);
+}
+static void __css_process_update_tree(struct cgroup_subsys_state *css, int cpu)
+{
/* put @css and all ancestors on the corresponding updated lists */
while (true) {
struct css_rstat_cpu *rstatc = css_rstat_cpu(css, cpu);
@@ -191,8 +211,19 @@ __bpf_kfunc void css_rstat_updated(struct cgroup_subsys_state *css, int cpu)
css = parent;
}
+}
+
+static void css_process_update_tree(struct cgroup_subsys *ss, int cpu)
+{
+ struct llist_head *lhead = ss_lhead_cpu(ss, cpu);
+ struct llist_node *lnode;
+
+ while ((lnode = llist_del_first_init(lhead))) {
+ struct css_rstat_cpu *rstatc;
- _css_rstat_cpu_unlock(css, cpu, flags, true);
+ rstatc = container_of(lnode, struct css_rstat_cpu, lnode);
+ __css_process_update_tree(rstatc->owner, cpu);
+ }
}
/**
@@ -300,6 +331,8 @@ static struct cgroup_subsys_state *css_rstat_updated_list(
flags = _css_rstat_cpu_lock(root, cpu, false);
+ css_process_update_tree(root->ss, cpu);
+
/* Return NULL if this subtree is not on-list */
if (!rstatc->updated_next)
goto unlock_ret;
--
2.47.1
Powered by blists - more mailing lists