lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Thu, 26 Jun 2008 13:34:11 -0700 From: "Paul Menage" <menage@...gle.com> To: "Max Krasnyansky" <maxk@...lcomm.com> Cc: "Vegard Nossum" <vegard.nossum@...il.com>, "Paul Jackson" <pj@....com>, a.p.zijlstra@...llo.nl, linux-kernel@...r.kernel.org, "Gautham shenoy" <ego@...ibm.com> Subject: Re: [RFC][PATCH] CPUSets: Move most calls to rebuild_sched_domains() to the workqueue On Thu, Jun 26, 2008 at 11:49 AM, Max Krasnyansky <maxk@...lcomm.com> wrote: >> >> Does that mean that you can't ever call get_online_cpus() from a >> workqueue thread? > > In general it should be ok (no different from user-space task calling it). No, I think it is a problem. When a CPU goes down, the hotplug code holds cpu_hotplug.lock and calls cleanup_workqueue_thread() which waits for any running work on that thread to finish. So if the workqueue thread running on that CPU calls get_online_cpus() while the hotplug thread is waiting for it, it will deadlock. (It's not clear to me how lockdep figured this out - I guess it's something to do with the *_acquire() annotations that tell lockdep to treat the workqueue structure as a pseudo-lock?) I guess the fix for that would be to have a non-workqueue thread to handle the async domain rebuilding, which isn't tied to a particular cpu the way workqueue threads are. > But there is still circular dependency because we're calling into > domain partitioning code. OK, so the problem is that since cpu_hotplug contains a hand-rolled rwsem, lockdep is going to spot false deadlocks. Is there any reason not to replace cpu_hotplug.lock and cpu_hotplug.refcount with an rw_semaphore, and have the following: void get_online_cpus(void) { might_sleep(); if (cpu_hotplug.active_writer == current) return; down_read(&cpu_hotplug.lock); } void put_online_cpus(void) { if (cpu_hotplug.active_writer == current) return; up_read(&cpu_hotplug.lock); } static void cpu_hotplug_begin(void) { down_write(&cpu_hotplug.lock); cpu_hotplug.active_writer = current; } static void cpu_hotplug_done(void) { cpu_hotplug.active_writer = NULL; up_write(&cpu_hotplug.lock); } I think that combined with moving the async rebuild_sched_domains to a separate thread should solve the problem, but I'm wondering why cpu_hotplug.lock was implemented this way in the first place. Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists