linux-kernel - [RFC PATCH v2 cpuset] Don't pass offlined cpus to partition_sched

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1365501570.3597.64.camel@ThinkPad-T5421.cn.ibm.com>
Date:	Tue, 09 Apr 2013 17:59:30 +0800
From:	Li Zhong <zhong@...ux.vnet.ibm.com>
To:	Li Zefan <lizefan@...wei.com>
Cc:	linux-kernel@...r.kernel.org, tj@...nel.org
Subject: [RFC PATCH v2 cpuset] Don't pass offlined cpus to
 partition_sched_domains()

Hi Zefan, 

I did some test today, enabling cpuset and online/offline the cpus. It
caused NULL address dereferencing in get_group(). After adding some
debugging code, it seems that it is caused by using some offlined cpus
as the parameter to partition_sched_domains(). More details below:

===================
following is printed in build_sched_domain(): 
@@ -6476,6 +6486,14 @@ struct sched_domain *build_sched_domain(struct
sched_domain_topology_level *tl,
                return child;

        cpumask_and(sched_domain_span(sd), cpu_map, tl->mask(cpu));
+       if (cpumask_empty(sched_domain_span(sd)))
+               printk("lz empty domain %p, mask %pS\n", sd, tl->mask);
+       char cpulist[128];
+       cpumask_scnprintf(cpulist, sizeof(cpulist), cpu_map);
+       printk("lz cpu_map %s\n", cpulist);
+       cpumask_scnprintf(cpulist, sizeof(cpulist), tl->mask(cpu));
+       printk("lz tl->mask(%d) %pS %s\n",cpu, tl->mask, cpulist);
+
        if (child) {
                sd->level = child->level + 1;
                sched_domain_level_max = max(sched_domain_level_max,
sd->level);

[  232.714502] lz empty domain c0000001f73c0000, mask cpu_smt_mask
+0x0/0x10
[  232.714515] lz cpu_map 1-2,7,13-15
[  232.714521] lz tl->mask(1) cpu_smt_mask+0x0/0x10
[  232.714529] lz cpu_map 1-2,7,13-15
[  232.714536] lz tl->mask(1) cpu_cpu_mask+0x0/0x10 2,7,13-15
[  232.714544] lz cpu_map 1-2,7,13-15
[  232.714551] lz tl->mask(1) sd_numa_mask+0x0/0x10 2,7,13-15
[  232.714558] lz cpu_map 1-2,7,13-15
[  232.714565] lz tl->mask(2) cpu_smt_mask+0x0/0x10 2
===================

It seems that one cpu (#1 of the above) could be taken offline in
cpuset_hotplug_workfn(), after top_cpuset's allowed cpus changed and the
propagated work function finished. But generate_sched_domains(), which
uses the cpuset information, still considers this cpu (#1) valid, and
passes it down in the cpumask to partition_sched_domains(). Then we have
an empty sd as the above. 

With the empty domain, in get_group(),
        if (child)
                cpu = cpumask_first(sched_domain_span(child));

this might cause the cpu to be returned a value >= nr_cpu_ids, then
cause bad dereference with the wrong cpu value in the later code. 

This following code tries to fix the issue by anding the cpu_active_mask
at the end of generate_sched_domains() for each partition. This might
result the partiton (set of domains) not the best one, but we know
another rebuild (caused by the cpu offline in the middle of
cpuset_hotplug_workfn()) is on the way. 

Also it seems that generate_sched_domains() needs be called together
with partition_sched_domains(), under the hotplug lock. Or similarly,
one cpu might be taken offline while doing generate_sched_domains(), and
cause the above issue. 

Or maybe we could put the whole cpuset_hotplug_workfn() into the hotplug
lock? 

Thanks, Zhong

---
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 4f9dfe4..aed8436 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -754,6 +754,12 @@ done:
 	 */
 	if (doms == NULL)
 		ndoms = 1;
+	else {
+		for (nslot = 0; nslot < ndoms; nslot++) {
+			struct cpumask *dp = doms[nslot];
+			cpumask_and(dp, dp, cpu_active_mask);
+		}
+	}

 	*domains    = doms;
 	*attributes = dattr;
@@ -2222,17 +2228,8 @@ static void cpuset_hotplug_workfn(struct work_struct *work)
 	flush_workqueue(cpuset_propagate_hotplug_wq);

 	/* rebuild sched domains if cpus_allowed has changed */
-	if (cpus_updated) {
-		struct sched_domain_attr *attr;
-		cpumask_var_t *doms;
-		int ndoms;
-
-		mutex_lock(&cpuset_mutex);
-		ndoms = generate_sched_domains(&doms, &attr);
-		mutex_unlock(&cpuset_mutex);
-
-		partition_sched_domains(ndoms, doms, attr);
-	}
+	if (cpus_updated)
+		rebuild_sched_domains();
 }

 void cpuset_update_active_cpus(bool cpu_online)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/