[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YsyO9GM9mCydaybo@slm.duckdns.org>
Date: Mon, 11 Jul 2022 10:58:28 -1000
From: Tejun Heo <tj@...nel.org>
To: Qais Yousef <qais.yousef@....com>
Cc: Xuewen Yan <xuewen.yan@...soc.com>, rafael@...nel.org,
viresh.kumar@...aro.org, mingo@...hat.com, peterz@...radead.org,
juri.lelli@...hat.com, vincent.guittot@...aro.org,
dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
mgorman@...e.de, bristot@...hat.com, linux-kernel@...r.kernel.org,
ke.wang@...soc.com, xuewyan@...mail.com, linux-pm@...r.kernel.org,
Waiman Long <longman@...hat.com>
Subject: Re: [PATCH] sched/schedutil: Fix deadlock between cpuset and cpu
hotplug when using schedutil
(cc'ing Waiman)
On Mon, Jul 11, 2022 at 06:46:29PM +0100, Qais Yousef wrote:
> Have you tried running with PROVE_LOCKDEP enabled? It'll help print a useful
> output about the DEADLOCK. But your explanation was good and clear to me.
I don't think lockdep would be able to track CPU1 -> CPU2 dependency here
unfortunately.
> AFAIU:
>
>
> CPU0 CPU1 CPU2
>
> // attach task to a different
> // cpuset cgroup via sysfs
> __acquire(cgroup_threadgroup_rwsem)
>
> // pring up CPU2 online
> __acquire(cpu_hotplug_lock)
> // wait for CPU2 to come online
> // bringup cpu online
> // call cpufreq_online() which tries to create sugov kthread
> __acquire(cpu_hotplug_lock) copy_process()
> cgroup_can_fork()
> cgroup_css_set_fork()
> __acquire(cgroup_threadgroup_rwsem)
> // blocks forever // blocks forever // blocks forever
>
>
> Is this a correct summary of the problem?
>
> The locks are held in reverse order and we end up with a DEADLOCK.
>
> I believe the same happens on offline it's just the path to hold the
> cgroup_threadgroup_rwsem on CPU2 is different.
>
> This will be a tricky one. Your proposed patch might fix it for this case, but
> if there's anything else that creates a kthread when a cpu goes online/offline
> then we'll hit the same problem again.
>
> I haven't reviewed your patch to be honest, but I think worth seeing first if
> there's something that can be done at the 'right level' first.
>
> Needs head scratching from my side at least. This is the not the first type of
> locking issue between hotplug and cpuset :-/
Well, the only thing I can think of is always grabbing cpus_read_lock()
before grabbing threadgroup_rwsem. Waiman, what do you think?
Thanks.
--
tejun
Powered by blists - more mailing lists