[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87im7yc2bu.fsf@oracle.com>
Date: Thu, 14 Jan 2021 22:21:25 -0500
From: Daniel Jordan <daniel.m.jordan@...cle.com>
To: Peter Zijlstra <peterz@...radead.org>,
Alexey Klimov <aklimov@...hat.com>
Cc: linux-kernel@...r.kernel.org, cgroups@...r.kernel.org,
yury.norov@...il.com, tglx@...utronix.de, jobaker@...hat.com,
audralmitchel@...il.com, arnd@...db.de, gregkh@...uxfoundation.org,
rafael@...nel.org, tj@...nel.org, lizefan.x@...edance.com,
qais.yousef@....com, hannes@...xchg.org, klimov.linux@...il.com
Subject: Re: [RFC][PATCH] cpu/hotplug: wait for cpuset_hotplug_work to
finish on cpu online
Daniel Jordan <daniel.m.jordan@...cle.com> writes:
> Peter Zijlstra <peterz@...radead.org> writes:
>>> The nature of this bug is also described here (with different consequences):
>>> https://lore.kernel.org/lkml/20200211141554.24181-1-qais.yousef@arm.com/
>>
>> Yeah, pesky deadlocks.. someone was going to try again.
>
> I dug up the synchronous patch
>
> https://lore.kernel.org/lkml/1579878449-10164-1-git-send-email-prsood@codeaurora.org/
>
> but surprisingly wasn't able to reproduce the lockdep splat from
>
> https://lore.kernel.org/lkml/F0388D99-84D7-453B-9B6B-EEFF0E7BE4CC@lca.pw/
>
> even though I could hit it a few weeks ago.
oh okay, you need to mount a legacy cpuset hierarchy.
So as the above splat shows, making cpuset_hotplug_workfn() synchronous
means cpu_hotplug_lock (and "cpuhp_state-down") can be acquired before
cgroup_mutex.
But there are at least four cgroup paths that take the locks in the
opposite order. They're all the same, they take cgroup_mutex and then
cpu_hotplug_lock later on to modify one or more static keys.
cpu_hotplug_lock should probably be ahead of cgroup_mutex because the
latter is taken in a hotplug callback, and we should keep the static
branches in cgroup, so the only way out I can think of is moving
cpu_hotplug_lock to just before cgroup_mutex is taken and switching to
_cpuslocked flavors of the static key calls.
lockdep quiets down with that change everywhere, but it puts another big
lock around a lot of cgroup paths. Seems less heavyhanded to go with
this RFC. What do you all think?
Absent further discussion, Alexey, do you plan to post another version?
Powered by blists - more mailing lists