lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 20 Oct 2017 18:37:46 +0100
From:   Tvrtko Ursulin <tursulin@...ulin.net>
To:     linux-kernel@...r.kernel.org
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
        Boris Ostrovsky <boris.ostrovsky@...cle.com>
Subject: Possible bug in CPU hotplug handling (4.14.0-rc5)


Hi all,

I think I've found in bug in the CPU hotplug handling when multi-instance
states are used. That is in the 4.14.0-rc5 kernel.

I have not attempted to get to the bottom of the issue to propose an actual
fix, since the logic there looks somewhat complex, but thought to first
seek opinion of the people in the know regarding this area.

What I think is happening is that when a multi-instance state is registered
last, the per-cpu st->node field remains "sticky" ie. remains set to the
address of the node belonging to the client which registered last.

The following hotplug event will then call all the callbacks passing that
same st->node to all the clients. Obviously bad things happen then, ranging
from oopses to silent memory corruption.

I have added some custom log messages to catch this and if you can try to
follow through them this is what happens. First a multi-instance state
client registers:

 cpuhp store callbacks state=176 name=perf/x86/intel/i915:online multi=1
 i915 cpuhp slot 176
 cpuhp_issue_call state=176 node=ffff88021db38390
 cpuhp_thread_fun cpu=0 state=176 bringup=1 node=ffff88021db38390
 cpuhp_invoke_callback multi-single cpu=0 state=176 node=ffff88021db38390
 cpuhp_thread_fun result=0
 cpuhp_issue_call state=176 node=ffff88021db38390
 cpuhp_thread_fun cpu=1 state=176 bringup=1 node=ffff88021db38390
 cpuhp_invoke_callback multi-single cpu=1 state=176 node=ffff88021db38390
 cpuhp_thread_fun result=0
 cpuhp_issue_call state=176 node=ffff88021db38390
 cpuhp_thread_fun cpu=2 state=176 bringup=1 node=ffff88021db38390
 cpuhp_invoke_callback multi-single cpu=2 state=176 node=ffff88021db38390
 cpuhp_thread_fun result=0
 cpuhp_issue_call state=176 node=ffff88021db38390
 cpuhp_thread_fun cpu=3 state=176 bringup=1 node=ffff88021db38390
 cpuhp_invoke_callback multi-single cpu=3 state=176 node=ffff88021db38390
 cpuhp_thread_fun result=0
 cpuhp_issue_call state=176 node=ffff88021db38390
 cpuhp_thread_fun cpu=4 state=176 bringup=1 node=ffff88021db38390
 cpuhp_invoke_callback multi-single cpu=4 state=176 node=ffff88021db38390
 cpuhp_thread_fun result=0
 cpuhp_issue_call state=176 node=ffff88021db38390
 cpuhp_thread_fun cpu=5 state=176 bringup=1 node=ffff88021db38390
 cpuhp_invoke_callback multi-single cpu=5 state=176 node=ffff88021db38390
 cpuhp_thread_fun result=0
 cpuhp_issue_call state=176 node=ffff88021db38390
 cpuhp_thread_fun cpu=6 state=176 bringup=1 node=ffff88021db38390
 cpuhp_invoke_callback multi-single cpu=6 state=176 node=ffff88021db38390
 cpuhp_thread_fun result=0
 cpuhp_issue_call state=176 node=ffff88021db38390
 cpuhp_thread_fun cpu=7 state=176 bringup=1 node=ffff88021db38390
 cpuhp_invoke_callback multi-single cpu=7 state=176 node=ffff88021db38390
 cpuhp_thread_fun result=0
 cpuhp added state=176 node=ffff88021db38390

Then a hotplug event happens:

 cpuhp store callbacks state=176 name=perf/x86/intel/i915:online multi=1
 i915 cpuhp slot 176
 cpuhp_issue_call state=176 node=ffff88021db38390
 cpuhp_thread_fun cpu=0 state=176 bringup=1 node=ffff88021db38390
 cpuhp_invoke_callback multi-single cpu=0 state=176 node=ffff88021db38390
 cpuhp_thread_fun result=0
 cpuhp_issue_call state=176 node=ffff88021db38390
 cpuhp_thread_fun cpu=1 state=176 bringup=1 node=ffff88021db38390
 cpuhp_invoke_callback multi-single cpu=1 state=176 node=ffff88021db38390
 cpuhp_thread_fun result=0
 cpuhp_issue_call state=176 node=ffff88021db38390
 cpuhp_thread_fun cpu=2 state=176 bringup=1 node=ffff88021db38390
 cpuhp_invoke_callback multi-single cpu=2 state=176 node=ffff88021db38390
 cpuhp_thread_fun result=0
 cpuhp_issue_call state=176 node=ffff88021db38390
 cpuhp_thread_fun cpu=3 state=176 bringup=1 node=ffff88021db38390
 cpuhp_invoke_callback multi-single cpu=3 state=176 node=ffff88021db38390
 cpuhp_thread_fun result=0
 cpuhp_issue_call state=176 node=ffff88021db38390
 cpuhp_thread_fun cpu=4 state=176 bringup=1 node=ffff88021db38390
 cpuhp_invoke_callback multi-single cpu=4 state=176 node=ffff88021db38390
 cpuhp_thread_fun result=0
 cpuhp_issue_call state=176 node=ffff88021db38390
 cpuhp_thread_fun cpu=5 state=176 bringup=1 node=ffff88021db38390
 cpuhp_invoke_callback multi-single cpu=5 state=176 node=ffff88021db38390
 cpuhp_thread_fun result=0
 cpuhp_issue_call state=176 node=ffff88021db38390
 cpuhp_thread_fun cpu=6 state=176 bringup=1 node=ffff88021db38390
 cpuhp_invoke_callback multi-single cpu=6 state=176 node=ffff88021db38390
 cpuhp_thread_fun result=0
 cpuhp_issue_call state=176 node=ffff88021db38390
 cpuhp_thread_fun cpu=7 state=176 bringup=1 node=ffff88021db38390
 cpuhp_invoke_callback multi-single cpu=7 state=176 node=ffff88021db38390
 cpuhp_thread_fun result=0
 cpuhp added state=176 node=ffff88021db38390
... etc ..

As you can see the node belonging to the i915 client is passed to all other
registered callbacks. (Due how cpuhp_thread_fun calls cpuhp_invoke_callback
with a set st->node, which then takes a different path in the latter.)

When another multi-instance client, like for example padata (via pcrypt),
is present, it will use the wrong pointer to dereference it's internal data
structure.

In this particular case I have ensured our client is the last to register
by re-loading it after boot. But I guess the same could happen depending on
the order of registrations during boot.

I can workaround around this in i915 by registering, and immediately
unregistering a dummy state, after the real one has been registered. This
action ensures the st->node field gets cleared.

If desired I could try to cook up a simple reproducer module? Or perhaps
it is immediately obvious to people experienced in this area what is
happening?

Kind regards,

Tvrtko

Powered by blists - more mailing lists