[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c10e4f69-9951-6c38-6e28-fafcaec00d89@redhat.com>
Date: Tue, 16 Aug 2022 18:11:03 -0400
From: Waiman Long <longman@...hat.com>
To: Tejun Heo <tj@...nel.org>
Cc: Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
Valentin Schneider <vschneid@...hat.com>,
Zefan Li <lizefan.x@...edance.com>,
Johannes Weiner <hannes@...xchg.org>,
Will Deacon <will@...nel.org>, cgroups@...r.kernel.org,
linux-kernel@...r.kernel.org,
Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [PATCH v5 3/3] cgroup/cpuset: Keep user set cpus affinity
On 8/16/22 16:15, Tejun Heo wrote:
> On Tue, Aug 16, 2022 at 03:27:34PM -0400, Waiman Long wrote:
>> +static int cpuset_set_cpus_allowed_ptr(struct task_struct *p,
>> + const struct cpumask *mask)
>> +{
>> + cpumask_var_t new_mask;
>> + int ret;
>> +
>> + if (!READ_ONCE(p->user_cpus_ptr)) {
>> + ret = set_cpus_allowed_ptr(p, mask);
>> + /*
>> + * If user_cpus_ptr becomes set now, we are racing with
>> + * a concurrent sched_setaffinity(). So use the newly
>> + * set user_cpus_ptr and retry again.
>> + *
>> + * TODO: We cannot detect change in the cpumask pointed to
>> + * by user_cpus_ptr. We will have to add a sequence number
>> + * if such a race needs to be addressed.
>> + */
> This is too ugly and obviously broken. Let's please do it properly.
Actually, there is similar construct in __sched_setaffinity():
again:
retval = __set_cpus_allowed_ptr(p, new_mask, SCA_CHECK);
if (retval)
goto out_free_new_mask;
cpuset_cpus_allowed(p, cpus_allowed);
if (!cpumask_subset(new_mask, cpus_allowed)) {
/*
* We must have raced with a concurrent cpuset update.
* Just reset the cpumask to the cpuset's cpus_allowed.
*/
cpumask_copy(new_mask, cpus_allowed);
goto again;
}
It is hard to synchronize different subsystems atomically without
running into locking issue. Let me think about what can be done in this
case.
Is using a sequence number to check for race with retry good enough?
Cheers,
Longman
Powered by blists - more mailing lists