lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Fri, 20 Oct 2017 18:37:46 +0100 From: Tvrtko Ursulin <tursulin@...ulin.net> To: linux-kernel@...r.kernel.org Cc: Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...nel.org>, Peter Zijlstra <peterz@...radead.org>, Sebastian Andrzej Siewior <bigeasy@...utronix.de>, "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>, Boris Ostrovsky <boris.ostrovsky@...cle.com> Subject: Possible bug in CPU hotplug handling (4.14.0-rc5) Hi all, I think I've found in bug in the CPU hotplug handling when multi-instance states are used. That is in the 4.14.0-rc5 kernel. I have not attempted to get to the bottom of the issue to propose an actual fix, since the logic there looks somewhat complex, but thought to first seek opinion of the people in the know regarding this area. What I think is happening is that when a multi-instance state is registered last, the per-cpu st->node field remains "sticky" ie. remains set to the address of the node belonging to the client which registered last. The following hotplug event will then call all the callbacks passing that same st->node to all the clients. Obviously bad things happen then, ranging from oopses to silent memory corruption. I have added some custom log messages to catch this and if you can try to follow through them this is what happens. First a multi-instance state client registers: cpuhp store callbacks state=176 name=perf/x86/intel/i915:online multi=1 i915 cpuhp slot 176 cpuhp_issue_call state=176 node=ffff88021db38390 cpuhp_thread_fun cpu=0 state=176 bringup=1 node=ffff88021db38390 cpuhp_invoke_callback multi-single cpu=0 state=176 node=ffff88021db38390 cpuhp_thread_fun result=0 cpuhp_issue_call state=176 node=ffff88021db38390 cpuhp_thread_fun cpu=1 state=176 bringup=1 node=ffff88021db38390 cpuhp_invoke_callback multi-single cpu=1 state=176 node=ffff88021db38390 cpuhp_thread_fun result=0 cpuhp_issue_call state=176 node=ffff88021db38390 cpuhp_thread_fun cpu=2 state=176 bringup=1 node=ffff88021db38390 cpuhp_invoke_callback multi-single cpu=2 state=176 node=ffff88021db38390 cpuhp_thread_fun result=0 cpuhp_issue_call state=176 node=ffff88021db38390 cpuhp_thread_fun cpu=3 state=176 bringup=1 node=ffff88021db38390 cpuhp_invoke_callback multi-single cpu=3 state=176 node=ffff88021db38390 cpuhp_thread_fun result=0 cpuhp_issue_call state=176 node=ffff88021db38390 cpuhp_thread_fun cpu=4 state=176 bringup=1 node=ffff88021db38390 cpuhp_invoke_callback multi-single cpu=4 state=176 node=ffff88021db38390 cpuhp_thread_fun result=0 cpuhp_issue_call state=176 node=ffff88021db38390 cpuhp_thread_fun cpu=5 state=176 bringup=1 node=ffff88021db38390 cpuhp_invoke_callback multi-single cpu=5 state=176 node=ffff88021db38390 cpuhp_thread_fun result=0 cpuhp_issue_call state=176 node=ffff88021db38390 cpuhp_thread_fun cpu=6 state=176 bringup=1 node=ffff88021db38390 cpuhp_invoke_callback multi-single cpu=6 state=176 node=ffff88021db38390 cpuhp_thread_fun result=0 cpuhp_issue_call state=176 node=ffff88021db38390 cpuhp_thread_fun cpu=7 state=176 bringup=1 node=ffff88021db38390 cpuhp_invoke_callback multi-single cpu=7 state=176 node=ffff88021db38390 cpuhp_thread_fun result=0 cpuhp added state=176 node=ffff88021db38390 Then a hotplug event happens: cpuhp store callbacks state=176 name=perf/x86/intel/i915:online multi=1 i915 cpuhp slot 176 cpuhp_issue_call state=176 node=ffff88021db38390 cpuhp_thread_fun cpu=0 state=176 bringup=1 node=ffff88021db38390 cpuhp_invoke_callback multi-single cpu=0 state=176 node=ffff88021db38390 cpuhp_thread_fun result=0 cpuhp_issue_call state=176 node=ffff88021db38390 cpuhp_thread_fun cpu=1 state=176 bringup=1 node=ffff88021db38390 cpuhp_invoke_callback multi-single cpu=1 state=176 node=ffff88021db38390 cpuhp_thread_fun result=0 cpuhp_issue_call state=176 node=ffff88021db38390 cpuhp_thread_fun cpu=2 state=176 bringup=1 node=ffff88021db38390 cpuhp_invoke_callback multi-single cpu=2 state=176 node=ffff88021db38390 cpuhp_thread_fun result=0 cpuhp_issue_call state=176 node=ffff88021db38390 cpuhp_thread_fun cpu=3 state=176 bringup=1 node=ffff88021db38390 cpuhp_invoke_callback multi-single cpu=3 state=176 node=ffff88021db38390 cpuhp_thread_fun result=0 cpuhp_issue_call state=176 node=ffff88021db38390 cpuhp_thread_fun cpu=4 state=176 bringup=1 node=ffff88021db38390 cpuhp_invoke_callback multi-single cpu=4 state=176 node=ffff88021db38390 cpuhp_thread_fun result=0 cpuhp_issue_call state=176 node=ffff88021db38390 cpuhp_thread_fun cpu=5 state=176 bringup=1 node=ffff88021db38390 cpuhp_invoke_callback multi-single cpu=5 state=176 node=ffff88021db38390 cpuhp_thread_fun result=0 cpuhp_issue_call state=176 node=ffff88021db38390 cpuhp_thread_fun cpu=6 state=176 bringup=1 node=ffff88021db38390 cpuhp_invoke_callback multi-single cpu=6 state=176 node=ffff88021db38390 cpuhp_thread_fun result=0 cpuhp_issue_call state=176 node=ffff88021db38390 cpuhp_thread_fun cpu=7 state=176 bringup=1 node=ffff88021db38390 cpuhp_invoke_callback multi-single cpu=7 state=176 node=ffff88021db38390 cpuhp_thread_fun result=0 cpuhp added state=176 node=ffff88021db38390 ... etc .. As you can see the node belonging to the i915 client is passed to all other registered callbacks. (Due how cpuhp_thread_fun calls cpuhp_invoke_callback with a set st->node, which then takes a different path in the latter.) When another multi-instance client, like for example padata (via pcrypt), is present, it will use the wrong pointer to dereference it's internal data structure. In this particular case I have ensured our client is the last to register by re-loading it after boot. But I guess the same could happen depending on the order of registrations during boot. I can workaround around this in i915 by registering, and immediately unregistering a dummy state, after the real one has been registered. This action ensures the st->node field gets cleared. If desired I could try to cook up a simple reproducer module? Or perhaps it is immediately obvious to people experienced in this area what is happening? Kind regards, Tvrtko
Powered by blists - more mailing lists