lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4F460D7B.1020703@linux.vnet.ibm.com>
Date:	Thu, 23 Feb 2012 15:27:15 +0530
From:	"Srivatsa S. Bhat" <srivatsa.bhat@...ux.vnet.ibm.com>
To:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
CC:	"Rafael J. Wysocki" <rjw@...k.pl>,
	Alan Stern <stern@...land.harvard.edu>,
	paulmck@...ux.vnet.ibm.com, Ingo Molnar <mingo@...e.hu>,
	paul@...lmenage.org, tj@...nel.org, frank.rowand@...sony.com,
	pjt@...gle.com, tglx@...utronix.de, lizf@...fujitsu.com,
	prashanth@...ux.vnet.ibm.com, vatsa@...ux.vnet.ibm.com,
	linux-kernel@...r.kernel.org, linux-pm@...r.kernel.org,
	"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>
Subject: Re: [PATCH 0/4] CPU hotplug, cpusets: Fix CPU online handling related
 to cpusets

On 02/20/2012 06:29 PM, Srivatsa S. Bhat wrote:

> Hi Peter,
> 
> On 02/20/2012 06:19 PM, Peter Zijlstra wrote:
> 
>> On Fri, 2012-02-17 at 17:45 +0530, Srivatsa S. Bhat wrote:
>>
>>>> Trivially removing CPU_TASKS_FROZEN as shown below doesn't look right to me:
>>>>
>>>> ---
>>>>
>>>>  kernel/sched/core.c |    4 ++--
>>>>  1 files changed, 2 insertions(+), 2 deletions(-)
>>>>
>>>>
>>>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>>>> index 5255c9d..43a166e 100644
>>>> --- a/kernel/sched/core.c
>>>> +++ b/kernel/sched/core.c
>>>> @@ -6729,7 +6729,7 @@ int __init sched_create_sysfs_power_savings_entries(struct device *dev)
>>>>  static int cpuset_cpu_active(struct notifier_block *nfb, unsigned long action,
>>>>  			     void *hcpu)
>>>>  {
>>>> -	switch (action & ~CPU_TASKS_FROZEN) {
>>>> +	switch (action) {
>>>>  	case CPU_ONLINE:
>>>>  	case CPU_DOWN_FAILED:
>>>>  		cpuset_update_active_cpus();
>>>> @@ -6742,7 +6742,7 @@ static int cpuset_cpu_active(struct notifier_block *nfb, unsigned long action,
>>>>  static int cpuset_cpu_inactive(struct notifier_block *nfb, unsigned long action,
>>>>  			       void *hcpu)
>>>>  {
>>>> -	switch (action & ~CPU_TASKS_FROZEN) {
>>>> +	switch (action) {
>>>>  	case CPU_DOWN_PREPARE:
>>>>  		cpuset_update_active_cpus();
>>>>  		return NOTIFY_OK;
>>>>
>>>>
>>>> IMO, irrespective of whether we keep cpusets unaware of all CPU Hotplug or
>>>> only unaware of the CPU hotplug in the suspend/resume path, I feel the
>>>> scheduler should always know the true state of the system, ie., offline CPUs
>>>> must not be part of any sched domain, at any point in time.
>>
>> That's really not a problem as long as they're not in the active mask.
>>


[...]

So, based on what you said above, I guess we can go with that simple patch.
(See below, for the patch with changelog).

I thought about what Ingo suggested (ie., not touching cpusets during cpu
hotplug, irrespective of whether it is part of suspend or not). And we can
implement that by having a scheme something like:

o Currently if a cpuset's cpus_allowed mask becomes empty due to CPU offline,
  all tasks in that cpuset is moved to a parent cpuset whose cpus_allowed mask
  is non-empty.
  Here, instead of *moving* the tasks to another cpuset, we could just change
  the cpus_allowed mask of each task in that cpuset to reflect the non-empty
  parent cpuset's cpus_allowed mask. IOW, during a CPU offline, we never touch
  a cpuset's cpus_allowed mask, we only modify the cpus_allowed mask of the
  *tasks* in that cpuset. Also, we never move a task from one cpuset to another
  due to CPU offline.

o Since we never modify a cpuset's cpus_allowed mask due to CPU offline, it is
  trivial to get back to original state when that CPU comes back online. Just
  compare the cpuset's cpus_allowed mask with cpu_active_mask and update the
  cpus_allowed masks of all the tasks in that cpuset.

We can definitely do all this, but I am not quite sure if this complexity is
justified (ie., complexity in the sense that the cpus_allowed mask of the tasks
in a cpuset might not always be the same as the cpus_allowed mask of that
cpuset).

However, if somebody feels that the above mentioned approach looks good and
the complexity is justified, please let me know.. But until then, the
following simple fix for the suspend/resume bug should suffice.

----

From: Srivatsa S. Bhat <srivatsa.bhat@...ux.vnet.ibm.com>
Subject: CPU hotplug, cpusets, suspend: Don't touch cpusets during suspend/resume

Currently, during CPU hotplug, the cpuset callbacks modify the cpusets
to reflect the state of the system, and this handling is asymmetric.
That is, upon CPU offline, that CPU is removed from all cpusets. However
when it comes back online, it is put back only to the root cpuset.

This gives rise to a significant problem during suspend/resume. During
suspend, we offline all non-boot cpus and during resume we online them back.
Which means, after a resume, all cpusets (except the root cpuset) will be
restricted to just one single CPU (the boot cpu). But the whole point of
suspend/resume is to restore the system to a state which is as close as
possible to how it was before suspend.

So to fix this, don't touch cpusets during suspend/resume. That is, modify
the cpuset-related CPU hotplug callback to just ignore CPU hotplug when it
is initiated as part of the suspend/resume sequence.

Reported-by: Prashanth Nageshappa <prashanth@...ux.vnet.ibm.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@...ux.vnet.ibm.com>
Cc: stable@...r.kernel.org
---

 kernel/sched/core.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 1169246..49ba9d4 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6728,7 +6728,7 @@ int __init sched_create_sysfs_power_savings_entries(struct device *dev)
 static int cpuset_cpu_active(struct notifier_block *nfb, unsigned long action,
 			     void *hcpu)
 {
-	switch (action & ~CPU_TASKS_FROZEN) {
+	switch (action) {
 	case CPU_ONLINE:
 	case CPU_DOWN_FAILED:
 		cpuset_update_active_cpus();
@@ -6741,7 +6741,7 @@ static int cpuset_cpu_active(struct notifier_block *nfb, unsigned long action,
 static int cpuset_cpu_inactive(struct notifier_block *nfb, unsigned long action,
 			       void *hcpu)
 {
-	switch (action & ~CPU_TASKS_FROZEN) {
+	switch (action) {
 	case CPU_DOWN_PREPARE:
 		cpuset_update_active_cpus();
 		return NOTIFY_OK;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ