[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1260958592.17860.25.camel@laptop>
Date: Wed, 16 Dec 2009 11:16:32 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Xiaotian Feng <dfeng@...hat.com>
Cc: linux-kernel@...r.kernel.org,
Rusty Russell <rusty@...tcorp.com.au>,
Ingo Molnar <mingo@...e.hu>, "H. Peter Anvin" <hpa@...or.com>,
Heiko Carstens <heiko.carstens@...ibm.com>
Subject: Re: [PATCH] fix cpu hotplug test failures on powerpc
On Wed, 2009-12-16 at 17:15 +0800, Xiaotian Feng wrote:
> Sachin found cpu hotplug test failures on powerpc, which made kernel
> hangs on his POWER box. This is addressed in
> http://marc.info/?l=linux-kernel&m=126052886204649&w=2
>
> commit 6ad4c18(sched: Fix balance vs hotplug race), switches to
> cpu_active_mask, but at some specific situation, kernel may cause
> some cpu inactive but online.
>
> In some powerpc machine, hotplug cpu0 is allowed. If cpu0 is the
> last alive cpu, when we tried to offline cpu0, we'll inactive cpu0
> in cpu_down(), after goes into __cpu_down(), kernel found num_online_cpus
> is 1, returned -EBUSY but cpu0 is not changed back to active. So
> cpu0 is inactive but online.
>
> The fix is to set cpu inactive when we're going to bring down the specific
> cpu in _cpu_down().
Good spotting, thanks! Some comments below.
> Reported-by: Sachin Sant <sachinp@...ibm.com>
> Signed-off-by: Xiaotian Feng <dfeng@...hat.com>
> Tested-by: Sachin Sant <sachinp@...ibm.com>
> Cc: Peter Zijlstra <peterz@...radead.org>
> Cc: Rusty Russell <rusty@...tcorp.com.au>
> Cc: Ingo Molnar <mingo@...e.hu>
> Cc: H. Peter Anvin <hpa@...or.com>
> Cc: Heiko Carstens <heiko.carstens@...ibm.com>
> ---
> kernel/cpu.c | 8 ++++++--
> 1 files changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 291ac58..a1e7165 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -209,6 +209,7 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen)
> return -ENOMEM;
>
> cpu_hotplug_begin();
> + set_cpu_active(cpu, false);
> err = __raw_notifier_call_chain(&cpu_chain, CPU_DOWN_PREPARE | mod,
> hcpu, -1, &nr_calls);
> if (err == NOTIFY_BAD) {
> @@ -280,8 +281,6 @@ int __ref cpu_down(unsigned int cpu)
> goto out;
> }
>
> - set_cpu_active(cpu, false);
> -
> /*
> * Make sure the all cpus did the reschedule and are not
> * using stale version of the cpu_active_mask.
That renders the synchronize_sched() call down there useless, so might
as well remove it then.
> @@ -387,12 +386,6 @@ int disable_nonboot_cpus(void)
> */
> cpumask_clear(frozen_cpus);
>
> - for_each_online_cpu(cpu) {
> - if (cpu == first_cpu)
> - continue;
> - set_cpu_active(cpu, false);
> - }
> -
> synchronize_sched();
And here too.
> printk("Disabling non-boot CPUs ...\n");
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists