[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <550020F6.6020105@imgtec.com>
Date: Wed, 11 Mar 2015 11:03:18 +0000
From: James Hogan <james.hogan@...tec.com>
To: <paulmck@...ux.vnet.ibm.com>
CC: <linux-kernel@...r.kernel.org>, <mingo@...nel.org>,
<laijs@...fujitsu.com>, <dipankar@...ibm.com>,
<akpm@...ux-foundation.org>, <mathieu.desnoyers@...icios.com>,
<josh@...htriplett.org>, <tglx@...utronix.de>,
<peterz@...radead.org>, <rostedt@...dmis.org>,
<dhowells@...hat.com>, <edumazet@...gle.com>,
<dvhart@...ux.intel.com>, <fweisbec@...il.com>, <oleg@...hat.com>,
<bobby.prani@...il.com>, <linux-metag@...r.kernel.org>
Subject: Re: [PATCH tip/core/rcu 04/20] metag: Use common outgoing-CPU-notification
code
On 10/03/15 16:59, Paul E. McKenney wrote:
> On Tue, Mar 10, 2015 at 03:30:42PM +0000, James Hogan wrote:
>> Hi Paul,
>>
>> On 03/03/15 17:42, Paul E. McKenney wrote:
>>> From: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
>>>
>>> This commit removes the open-coded CPU-offline notification with new
>>> common code. This change avoids calling scheduler code using RCU from
>>> an offline CPU that RCU is ignoring. This commit is compatible with
>>> the existing code in not checking for timeout during a prior offline
>>> for a given CPU.
>>>
>>> Signed-off-by: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
>>> Cc: James Hogan <james.hogan@...tec.com>
>>> Cc: <linux-metag@...r.kernel.org>
>>
>> I gave this a try via linux-next, but unfortunately it causes the
>> following warning every time a CPU goes down:
>> META213-Thread0 DSP [LogF] CPU1: unable to kill
>
> That is certainly not what I had in mind, thank you for finding this!
>
>> If I add printks, I see that the state on entry to both cpu_wait_death
>> and cpu_report_death is already CPU_POST_DEAD, suggesting that it hasn't
>> changed from its initial value.
>>
>> Should arches other than x86 now be calling cpu_set_state_online()? The
>> patchlet below seems to resolve it for Meta (not sure if that is the
>> best place in the startup sequence to do it, perhaps it doesn't matter).
>>
>> diff --git a/arch/metag/kernel/smp.c b/arch/metag/kernel/smp.c
>> index ac3a199e33e7..430e379ec71f 100644
>> --- a/arch/metag/kernel/smp.c
>> +++ b/arch/metag/kernel/smp.c
>> @@ -383,6 +383,7 @@ asmlinkage void secondary_start_kernel(void)
>> * OK, now it's safe to let the boot CPU continue
>> */
>> set_cpu_online(cpu, true);
>> + cpu_set_state_online(cpu);
>> complete(&cpu_running);
>>
>> /*
>>
>> Looking at the comment before cpu_set_state_online:
>>> /*
>>> * Mark the specified CPU online.
>>> *
>>> * Note that it is permissible to omit this call entirely, as is
>>> * done in architectures that do no CPU-hotplug error checking.
>>> */
>>
>> Which suggests it wasn't wrong to omit it before your patches came
>> along.
>
> And that suggestion is quite correct. The idea was indeed to accommodate
> architectures that do not do error checking.
>
> Does the following patch (on top of current -next) remove the need for
> your addition of cpu_set_state_online() above?
Don't forget the "oldstate == ", otherwise it'll work for the wrong
reason :-/
Checking for CPU_POST_DEAD does seem to fix the immediate problem,
however this still leaves open the possibility of a single timeout
propagating to all further offlines after CPU_DEAD_FROZEN gets set. I've
confirmed that by adding a delay loop only on the second
cpu_report_death() call, and sure enough the 2nd and further offlines
all fail even though the CPU stops immediately after the 2nd one.
If this check is primarily so that CPU_DEAD_FROZEN is set if
cpu_wait_death timed out, would it be better to instead check explicitly
for CPU_BROKEN?
diff --git a/kernel/smpboot.c b/kernel/smpboot.c
index 18688e0b0422..c697f73d82d6 100644
--- a/kernel/smpboot.c
+++ b/kernel/smpboot.c
@@ -460,7 +460,7 @@ bool cpu_report_death(void)
do {
oldstate = atomic_read(&per_cpu(cpu_hotplug_state, cpu));
- if (oldstate == CPU_ONLINE)
+ if (oldstate != CPU_BROKEN)
newstate = CPU_DEAD;
else
newstate = CPU_DEAD_FROZEN;
Cheers
James
>
> Thanx, Paul
>
> ------------------------------------------------------------------------
>
> diff --git a/kernel/smpboot.c b/kernel/smpboot.c
> index 18688e0b0422..80400e019c86 100644
> --- a/kernel/smpboot.c
> +++ b/kernel/smpboot.c
> @@ -460,7 +460,7 @@ bool cpu_report_death(void)
>
> do {
> oldstate = atomic_read(&per_cpu(cpu_hotplug_state, cpu));
> - if (oldstate == CPU_ONLINE)
> + if (oldstate == CPU_ONLINE || CPU_POST_DEAD)
> newstate = CPU_DEAD;
> else
> newstate = CPU_DEAD_FROZEN;
>
Download attachment "signature.asc" of type "application/pgp-signature" (820 bytes)
Powered by blists - more mailing lists