[<prev] [next>] [day] [month] [year] [list]
Message-ID: <CO1PR04MB396863B41DF34676D97D53794D40@CO1PR04MB396.namprd04.prod.outlook.com>
Date:   Wed, 5 Jul 2017 21:07:18 +0000
From:   Ethan Barnes <Ethan.Barnes@....com>
To:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: PROBLEM: Page fault from first CPU Hotplug dynamic state callback not
 being removed
The CPUHP_BP_PREPARE_DYN state callback is not removed if it is the only 
callback in the DYN state array. This may result in a page fault when a 
driver is unloaded and reloaded and a cpu is again offlined.
Keywords: cpu hotplug, cpuhp
Kernel version 4.9.x, 4.10.x, 4.11.x, 4.12-rcx
To get a cpu offline hotplug callback once the cpu is actually offline, 
generic drivers can use CPUHP_BP_PREPARE_DYN cpuhp_state, which inserts 
the online and offline callbacks into a struct array.
However, it appears that when there is only one entry in the 
cpuhp_bp_state[] array, and cpuhp_remove_state() is called, the code 
fails to remove the callback.
This is demonstrated by inserting a callback using cpuhp_setup_state() 
for the CPUHP_BP_PREPARE_DYN state in a kernel driver, then monitoring 
which enum state is returned from cpuhp_setup_state() while offlining 
and onlining a cpu several times.
If the first call to cpuhp_setup_state() returns 
CPUHP_BP_PREPARE_DYN--indicating that it is the only state in the DYN 
array with callbacks--and then cpuhp_remove_state() is called with 
CPUHP_BP_PREPARE_DYN, then the second call to cpuhp_setup_state() 
returns CPUHP_BP_PREPARE_DYN + 1. This shows that the 
CPUHP_BP_PREPARE_DYN state was _not_ removed.
An examination of the code in kernel/cpu.c shows why: 
__cpuph_remove_state() calls cpuhp_store_callbacks() with NULLs to 
indicate state removal. However in the case of state being 
CPUHP_BP_PREPARE_STATE, cpuhp_store_callbacks() erroneously calls 
cpuhp_reserve_state() which returns the next available DYN state value. 
That next state is used to store the NULLs, which leaves the first DYN 
state with the callbacks still stored.
Note that the CPUHP_AP_ONLINE_DYN has the same problem but I didn't see 
it in my case because my CPU_AP_ONLINE_DYN callback was not the first AP 
DYN callback registered. Whatever subsystem registered that first AP DYN 
state would have the same removal problem.
One possible fix would be to add a check in cpuhp_store_callbacks() for 
name == NULL. For example, in kernel v4.10.17:
if (NULL != name &&	// ADDED
    (state == CPUHP_AP_ONLINE_DYN || state == CPUHP_BP_PREPARE_DYN)) {
		ret = cpuhp_reserve_state(state);
		if (ret < 0)
			return ret;
		state = ret;
	}
...but there are other ways to fix it.
Thoughts?
Thanks,
Ethan
Powered by blists - more mailing lists
 
