[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220712125702.yg4eqbaakvj56k6m@wubuntu>
Date: Tue, 12 Jul 2022 13:57:02 +0100
From: Qais Yousef <qais.yousef@....com>
To: Tejun Heo <tj@...nel.org>
Cc: Xuewen Yan <xuewen.yan@...soc.com>, rafael@...nel.org,
viresh.kumar@...aro.org, mingo@...hat.com, peterz@...radead.org,
juri.lelli@...hat.com, vincent.guittot@...aro.org,
dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
mgorman@...e.de, bristot@...hat.com, linux-kernel@...r.kernel.org,
ke.wang@...soc.com, xuewyan@...mail.com, linux-pm@...r.kernel.org,
Waiman Long <longman@...hat.com>
Subject: Re: [PATCH] sched/schedutil: Fix deadlock between cpuset and cpu
hotplug when using schedutil
On 07/11/22 10:58, Tejun Heo wrote:
> (cc'ing Waiman)
>
> On Mon, Jul 11, 2022 at 06:46:29PM +0100, Qais Yousef wrote:
> > Have you tried running with PROVE_LOCKDEP enabled? It'll help print a useful
> > output about the DEADLOCK. But your explanation was good and clear to me.
>
> I don't think lockdep would be able to track CPU1 -> CPU2 dependency here
> unfortunately.
I see. It seems I got it wrong and CPU1 and CPU2 below run from the same
context, IIU Xuewen's response correctly this time that is.
>
> > AFAIU:
> >
> >
> > CPU0 CPU1 CPU2
> >
> > // attach task to a different
> > // cpuset cgroup via sysfs
> > __acquire(cgroup_threadgroup_rwsem)
> >
> > // pring up CPU2 online
> > __acquire(cpu_hotplug_lock)
> > // wait for CPU2 to come online
> > // bringup cpu online
> > // call cpufreq_online() which tries to create sugov kthread
> > __acquire(cpu_hotplug_lock) copy_process()
> > cgroup_can_fork()
> > cgroup_css_set_fork()
> > __acquire(cgroup_threadgroup_rwsem)
> > // blocks forever // blocks forever // blocks forever
> >
> >
> > Is this a correct summary of the problem?
> >
> > The locks are held in reverse order and we end up with a DEADLOCK.
> >
> > I believe the same happens on offline it's just the path to hold the
> > cgroup_threadgroup_rwsem on CPU2 is different.
> >
> > This will be a tricky one. Your proposed patch might fix it for this case, but
> > if there's anything else that creates a kthread when a cpu goes online/offline
> > then we'll hit the same problem again.
> >
> > I haven't reviewed your patch to be honest, but I think worth seeing first if
> > there's something that can be done at the 'right level' first.
> >
> > Needs head scratching from my side at least. This is the not the first type of
> > locking issue between hotplug and cpuset :-/
>
> Well, the only thing I can think of is always grabbing cpus_read_lock()
> before grabbing threadgroup_rwsem. Waiman, what do you think?
Is there a lot of subsystems beside cpuset that needs the cpus_read_lock()?
A quick grep tells me it's the only one.
Can't we instead use cpus_read_trylock() in cpuset_can_attach() so that we
either hold the lock successfully then before we go ahead and call
cpuset_attach(), or bail out and cancel the whole attach operation which should
unlock the threadgroup_rwsem() lock?
Sounds simple in theory, but as always the devil is in the details which
I don't have all inside my head. AFAICS cgroup_migration_execute() will call
ss->can_attach() for each subsystem. And it's the only caller of can_attach().
Not sure if it'd be safe to do cpus_read_unlock() at the end of cpuset_attach()
or we need to introduce an additional callback to 'cpuset_attach_finish()'
where we'd release the lock.
ie something like the below untested-and-for-demo-purposes-only patch:
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 71a418858a5e..86c7c6aa0f12 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -2226,6 +2226,12 @@ static int cpuset_can_attach(struct cgroup_taskset *tset)
struct task_struct *task;
int ret;
+ ret = cpus_read_trylock();
+ if (!ret) {
+ ret = -EBUSY;
+ goto out_unlock;
+ }
+
/* used later by cpuset_attach() */
cpuset_attach_old_cs = task_cs(cgroup_taskset_first(tset, &css));
cs = css_cs(css);
@@ -2289,7 +2295,8 @@ static void cpuset_attach(struct cgroup_taskset *tset)
cgroup_taskset_first(tset, &css);
cs = css_cs(css);
- cpus_read_lock();
+ // Requires cpus_read_lock() to be held already
+ // releases cpus_read_lock() when done
percpu_down_write(&cpuset_rwsem);
guarantee_online_mems(cs, &cpuset_attach_nodemask_to);
Cheers
--
Qais Yousef
Powered by blists - more mailing lists