[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAJFUiJhG_ijoPXocCH4noRiNFRhnweGFCG7m9UN+2F8kEn88pg@mail.gmail.com>
Date: Mon, 6 Jun 2016 22:38:38 -0700
From: Lianwei Wang <lianwei.wang@...il.com>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: Peter Zijlstra <peterz@...radead.org>,
Oleg Nesterov <oleg@...hat.com>,
Ingo Molnar <mingo@...nel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
linux-pm@...r.kernel.org
Subject: Re: [PATCH] cpu/hotplug: handle unbalanced hotplug enable/disable
On Thu, May 12, 2016 at 1:06 AM, Lianwei Wang <lianwei.wang@...il.com> wrote:
> I have come up a patch to make the pm notifier called symmetrically
> and currently being tested. I will send it out after pass the test.
>
> On Fri, May 6, 2016 at 12:18 AM, Thomas Gleixner <tglx@...utronix.de> wrote:
>> On Fri, 6 May 2016, Lianwei Wang wrote:
>>> On Thu, May 5, 2016 at 5:13 AM, Thomas Gleixner <tglx@...utronix.de> wrote:
>>> > Can you eventually come up with a coherent explanation of the problem down to
>>> > the root cause or are we going to play this "move the workaround one step
>>> > down" game for another 10 rounds?
>>> >
>>> Do you agree that any driver can abort the suspend process by
>>> returning an error or NOTIFY_BAD if it is not ready to suspend?
>>> I have explain it and I also copied the example code that abort
>>> suspend by returning an error or NOTIFY_BAD in the pm notifier
>>> callback function.
>>
>> I don't need copied example code which does not tell me what the real problem
>> is.
>>
>>> The cpu_hotplug_disable and cpu_hotplug_enable are called in one of
>>> the PM notifier callback. And they are called from two difference
>>> place.
>>> Below is how it happened:
>>> pm_suspend
>>> |--enter_state
>>> |--suspend_prepare
>>> |--pm_notifier_call_chain(PM_SUSPEND_PREPARE)
>>> | |--call_back_1
>>> | |--call_back_..
>>> | |--call_back_n ===> return NOTIFY_BAD to abort call chain and
>>> | | suspend process here
>>> | |--cpu_hotplug_pm_callback()
>>> | | |--cpu_hotplug_disable =====> remember it is not
>>> called yet
>>> | |--call_back_..
>>> |
>>> |--pm_notifier_call_chain(PM_POST_SUSPEND)
>>> | |--call_back_1
>>> | |--call_back_..
>>> | |--call_back_n
>>> | |--cpu_hotplug_pm_callback()
>>> | | |--cpu_hotplug_enable =====> Here it is unbalanced called
>>> | |--call_back_..
>>> |
>>> So, keep in mind that for pm notifier call chain, the
>>> PM_SUSPEND_PREPARE notifier and PM_POST_SUSPEND notifier is not always
>>> paired called. Sometimes for a driver's pm notifier callback, the
>>> PM_POST_SUSPEND is called without PM_SUSPEND_PREPARE.
>>
>> So that is the real problem: cpu_hotplug_pm_callback(PM_POST_SUSPEND) can be
>> called w/o a previous call to cpu_hotplug_pm_callback(PM_SUSPEND_PREPARE).
>>
>>> > It cannot prevent any unbalanced calls. It mitigates the issue, but that's a
>>> > different problem.
>>> It did not migrate the issue. It give a warning message to log the
>>> unbalanced issue and it also make sure the cpu hotplug continue to
>>> work well even someone do an unbalanced call. It is a good checking as
>>> the enable_irq/disable_irq do. There are some other unbalanced
>>> checking in kernel too. All make sure the kernel has a better
>>> stability.
>>
>> I'm not opposed to do that and I said so several times. But I said as well,
>> that we do not add this without fixing the problem which made you write that
>> patch in the first place.
>>
>> So we have a proper explanation for the real problem now, but we have no
>> fix.
>>
>> And again: Your patch is NOT a fix. Simply because it will emit a warning
>> everytime the above happens. And that's wrong because the abort is a
>> legitimate scenario.
>>
>> So please come up with a sensible fix for the suspend abort issue and then we
>> can add the balance check/fixup to the hotplug_disable/enable() code.
>>
>> Thanks,
>>
>> tglx
>>
>>
I think this is the best solution to fix this unbalanced issue for now.
Powered by blists - more mailing lists