lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 6 May 2016 00:06:44 -0700
From:	Lianwei Wang <lianwei.wang@...il.com>
To:	Thomas Gleixner <tglx@...utronix.de>
Cc:	Peter Zijlstra <peterz@...radead.org>, oleg@...hat.com,
	Ingo Molnar <mingo@...nel.org>, linux-kernel@...r.kernel.org,
	linux-pm@...r.kernel.org
Subject: Re: [PATCH] cpu/hotplug: handle unbalanced hotplug enable/disable

On Thu, May 5, 2016 at 5:13 AM, Thomas Gleixner <tglx@...utronix.de> wrote:
> On Wed, 4 May 2016, Lianwei Wang wrote:
>> In this example, the unbalanced count is caused by the
>> cpu_hotplug_pm_callback pm notifier callback function.
>
> I doubt that.
>
>> We can add a variable to avoid the unbalanced call of cpu_hotplug_enable
>> ,e.g.
>
>> diff --git a/kernel/cpu.c b/kernel/cpu.c
>> index 3e3f6e49eabb..aa6694f0e9d3 100644
>> --- a/kernel/cpu.c
>> +++ b/kernel/cpu.c
>> @@ -1140,16 +1140,21 @@ static int
>>  cpu_hotplug_pm_callback(struct notifier_block *nb,
>>                         unsigned long action, void *ptr)
>>  {
>> +       static int disabled;
>> +
>>         switch (action) {
>>
>>         case PM_SUSPEND_PREPARE:
>>         case PM_HIBERNATION_PREPARE:
>>                 cpu_hotplug_disable();
>> +               disabled = 1;
>>                 break;
>>
>>         case PM_POST_SUSPEND:
>>         case PM_POST_HIBERNATION:
>> -               cpu_hotplug_enable();
>> +               if (disabled)
>> +                       cpu_hotplug_enable();
>> +               disabled = 0;
>>                 break;
>>
>>         default:
>>
>> Please let me know if you like to fix it in this way.
>
> So you are moving the work around one step down w/o providing any reasonable
> explanation how this asymetric call of that callback can happen.
>
> Can you eventually come up with a coherent explanation of the problem down to
> the root cause or are we going to play this "move the workaround one step
> down" game for another 10 rounds?
>
Do you agree that any driver can abort the suspend process by
returning an error or NOTIFY_BAD if it is not ready to suspend?
I have explain it and I also copied the example code that abort
suspend by returning an error or NOTIFY_BAD in the pm notifier
callback function.
The cpu_hotplug_disable and cpu_hotplug_enable are called in one of
the PM notifier callback. And they are called from two difference
place.
Below is how it happened:
  pm_suspend
    |--enter_state
        |--suspend_prepare
            |--pm_notifier_call_chain(PM_SUSPEND_PREPARE)
            |    |--call_back_1
            |    |--call_back_..
            |    |--call_back_n ===> return NOTIFY_BAD to abort call chain and
            |    |                                suspend process here
            |    |--cpu_hotplug_pm_callback()
            |    |   |--cpu_hotplug_disable  =====> remember it is not
called yet
            |    |--call_back_..
            |
            |--pm_notifier_call_chain(PM_POST_SUSPEND)
            |    |--call_back_1
            |    |--call_back_..
            |    |--call_back_n
            |    |--cpu_hotplug_pm_callback()
            |    |   |--cpu_hotplug_enable  =====> Here it is unbalanced called
            |    |--call_back_..
            |
So, keep in mind that for pm notifier call chain, the
PM_SUSPEND_PREPARE notifier and PM_POST_SUSPEND notifier is not always
paired called. Sometimes for a driver's pm notifier callback, the
PM_POST_SUSPEND is called without PM_SUSPEND_PREPARE.

>> +static void _cpu_hotplug_enable(void)
>> +{
>> +       if (WARN(!cpu_hotplug_disabled, "Unbalanced cpu hotplug enable\n"))
>> +               return;
>> +
>> +       cpu_hotplug_disabled--;
>> +}
>>
>> I like to fix it in the cpu_hotplug_enable because it is a public
>
> You CANNOT fix it there. The problem is the call site and NOT
> cpu_hotplug_enable(). Can you finally accept this?
I know what you mean. But why let the driver unconditionally do
"--cpu_hotplug_disabled" without any checking. It should do nothing if
it detect a unbalanced enable. A good example for unbalanced checking
from enable_irq is here:
http://lxr.free-electrons.com/source/kernel/irq/manage.c#L512

It's the cpu hotplug driver's responsibility to guarantee that the cpu
hotplug always working well even others failed do something to it. And
the driver can check and handle it itself, why not let the driver to
handle it and make the cpu hotplug driver more strong?
>
>> kernel API and fix in it can prevent any other unbalanced calling.
>
> It cannot prevent any unbalanced calls. It mitigates the issue, but that's a
> different problem.
It did not migrate the issue. It give a warning message to log the
unbalanced issue and it also make sure the cpu hotplug continue to
work well even someone do an unbalanced call. It is a good checking as
the enable_irq/disable_irq do. There are some other unbalanced
checking in kernel too. All make sure the kernel has a better
stability.
>
> We can discuss that seperately after fixing the offending call site.
>
> Thanks,
>
>         tglx

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ