[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20220826010119.1265764-4-longman@redhat.com>
Date: Thu, 25 Aug 2022 21:01:17 -0400
From: Waiman Long <longman@...hat.com>
To: Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
Valentin Schneider <vschneid@...hat.com>,
Tejun Heo <tj@...nel.org>, Zefan Li <lizefan.x@...edance.com>,
Johannes Weiner <hannes@...xchg.org>,
Will Deacon <will@...nel.org>
Cc: linux-kernel@...r.kernel.org,
Linus Torvalds <torvalds@...ux-foundation.org>,
Lai Jiangshan <jiangshanlai@...il.com>,
Waiman Long <longman@...hat.com>
Subject: [PATCH v6 3/5] sched: Enforce user requested affinity
It was found that the user requested affinity via sched_setaffinity()
can be easily overwritten by other kernel subsystems without an easy way
to reset it back to what the user requested. For example, any change
to the current cpuset hierarchy may reset the cpumask of the tasks in
the affected cpusets to the default cpuset value even if those tasks
have pre-existing user requested affinity. That is especially easy to
trigger under a cgroup v2 environment where writing "+cpuset" to the
root cgroup's cgroup.subtree_control file will reset the cpus affinity
of all the processes in the system.
That is problematic in a nohz_full environment where the tasks running
in the nohz_full CPUs usually have their cpus affinity explicitly set
and will behave incorrectly if cpus affinity changes.
Fix this problem by looking at user_cpus_ptr in __set_cpus_allowed_ptr()
and use it to restrcit the given cpumask unless there is no overlap. In
that case, it will fallback to the given one.
All callers of set_cpus_allowed_ptr() will be affected by this change.
A scratch cpumask is added to percpu runqueues structure for doing
additional masking when user_cpus_ptr is set. The scratch cpumask should
get allocated during cpu activation. A fallback atomic memory allocation
in __set_cpus_allowed_ptr() is also added in case set_cpus_allowed_ptr()
is called before the scratch cpumask is properly allocated.
Signed-off-by: Waiman Long <longman@...hat.com>
---
kernel/sched/core.c | 36 +++++++++++++++++++++++++++++++++++-
kernel/sched/sched.h | 3 +++
2 files changed, 38 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ac2b103d69dc..1c2f548e5369 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2928,11 +2928,40 @@ static int __set_cpus_allowed_ptr_locked(struct task_struct *p,
static int __set_cpus_allowed_ptr(struct task_struct *p,
const struct cpumask *new_mask, u32 flags)
{
+ struct cpumask *alloc_mask = NULL;
struct rq_flags rf;
struct rq *rq;
+ int ret;
rq = task_rq_lock(p, &rf);
- return __set_cpus_allowed_ptr_locked(p, new_mask, flags, rq, &rf);
+ if (p->user_cpus_ptr) {
+
+ /*
+ * A scratch cpumask is allocated on the percpu runqueues
+ * to enable additional masking with user_cpus_ptr. This
+ * cpumask, once allocated, will not be freed.
+ */
+ if (unlikely(!rq->scratch_mask)) {
+ alloc_mask = kmalloc(cpumask_size(), GFP_ATOMIC);
+ if (!rq->scratch_mask && alloc_mask) {
+ rq->scratch_mask = alloc_mask;
+ alloc_mask = NULL;
+ }
+ }
+ /*
+ * Ignore user_cpus_ptr if atomic memory allocation fails
+ * or it doesn't intersect new_mask.
+ */
+ if (rq->scratch_mask &&
+ cpumask_and(rq->scratch_mask, new_mask, p->user_cpus_ptr))
+ new_mask = rq->scratch_mask;
+ }
+
+
+ ret = __set_cpus_allowed_ptr_locked(p, new_mask, flags, rq, &rf);
+ if (unlikely(alloc_mask))
+ kfree(alloc_mask);
+ return ret;
}
int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
@@ -9352,6 +9381,11 @@ int sched_cpu_activate(unsigned int cpu)
sched_update_numa(cpu, true);
sched_domains_numa_masks_set(cpu);
cpuset_cpu_active();
+ /*
+ * Preallocated scratch cpumask
+ */
+ if (!rq->scratch_mask)
+ rq->scratch_mask = kmalloc(cpumask_size(), GFP_KERNEL);
}
/*
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index a49c17e1c7ea..66a6bfddd716 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1159,6 +1159,9 @@ struct rq {
unsigned int core_forceidle_occupation;
u64 core_forceidle_start;
#endif
+
+ /* Scratch cpumask to be temporarily used under rq_lock */
+ struct cpumask *scratch_mask;
};
#ifdef CONFIG_FAIR_GROUP_SCHED
--
2.31.1
Powered by blists - more mailing lists