linux-kernel - Re: [RFC][PATCH] sched: Start stopper early

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20151016082212.GC4684@osiris>
Date:	Fri, 16 Oct 2015 10:22:12 +0200
From:	Heiko Carstens <heiko.carstens@...ibm.com>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	linux-kernel@...r.kernel.org, Tejun Heo <tj@...nel.org>,
	Oleg Nesterov <oleg@...hat.com>,
	Ingo Molnar <mingo@...nel.org>, Rik van Riel <riel@...hat.com>,
	Martin Schwidefsky <schwidefsky@...ibm.com>,
	Michael Holzheu <holzheu@...ux.vnet.ibm.com>,
	Tobias Orlamuende <orlam@...ibm.com>
Subject: Re: [RFC][PATCH] sched: Start stopper early

On Wed, Oct 07, 2015 at 10:41:10AM +0200, Peter Zijlstra wrote:
> Hi,
> 
> So Heiko reported some 'interesting' fail where stop_two_cpus() got
> stuck in multi_cpu_stop() with one cpu waiting for another that never
> happens.
> 
> It _looks_ like the 'other' cpu isn't running and the current best
> theory is that we race on cpu-up and get the stop_two_cpus() call in
> before the stopper task is running.
> 
> This _is_ possible because we set 'online && active' _before_ we do the
> smpboot_unpark thing because of ONLINE notifier order.
> 
> The below test patch manually starts the stopper task early.
> 
> It boots and hotplugs a cpu on my test box so its not insta broken.
> 
> ---
>  kernel/sched/core.c   |    7 ++++++-
>  kernel/stop_machine.c |    5 +++++
>  2 files changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 1764a0f..9a56ef7 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5542,14 +5542,19 @@ static void set_cpu_rq_start_time(void)
>  	rq->age_stamp = sched_clock_cpu(cpu);
>  }
> 
> +extern void cpu_stopper_unpark(unsigned int cpu);
> +
>  static int sched_cpu_active(struct notifier_block *nfb,
>  				      unsigned long action, void *hcpu)
>  {
> +	int cpu = (long)hcpu;
> +
>  	switch (action & ~CPU_TASKS_FROZEN) {
>  	case CPU_STARTING:
>  		set_cpu_rq_start_time();
>  		return NOTIFY_OK;
>  	case CPU_ONLINE:
> +		cpu_stopper_unpark(cpu);
>  		/*
>  		 * At this point a starting CPU has marked itself as online via
>  		 * set_cpu_online(). But it might not yet have marked itself
> @@ -5558,7 +5563,7 @@ static int sched_cpu_active(struct notifier_block *nfb,
>  		 * Thus, fall-through and help the starting CPU along.
>  		 */
>  	case CPU_DOWN_FAILED:
> -		set_cpu_active((long)hcpu, true);
> +		set_cpu_active(cpu, true);
>  		return NOTIFY_OK;
>  	default:
>  		return NOTIFY_DONE;
> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
> index 12484e5..c674371 100644
> --- a/kernel/stop_machine.c
> +++ b/kernel/stop_machine.c
> @@ -496,6 +496,11 @@ static struct smp_hotplug_thread cpu_stop_threads = {
>  	.selfparking		= true,
>  };
> 
> +void cpu_stopper_unpark(unsigned int cpu)
> +{
> +	kthread_unpark(per_cpu(cpu_stopper.thread, cpu));
> +}
> +

So, actually this doesn't fix the bug and it _seems_ to be reproducible.

[ FWIW, I will be offline for the next two weeks ]

The bug was reproduced with your patch applied to 4.2.0 (+ couple of
unrelated internal patches).

In addition I cherry-picked these two upstream commits:
dd9d3843755d "sched: Fix cpu_active_mask/cpu_online_mask race"
02cb7aa923ec "stop_machine: Move 'cpu_stopper_task' and
              'stop_cpus_work' into 'struct cpu_stopper'"

The new dump again shows one cpu looping in multi_cpu_stop() triggered by
stop_two_cpus(), and the second one will never enter multi_cpu_stop() since
the corresponding cpu_stop_work was never enqueued:

The two cpu_stop_work on the stack of the process that invocated
stop_two_cpus() look like this:

crash> struct cpu_stop_work 0x8ad8afa78
struct cpu_stop_work {
  list = {
    next = 0x8ad8afa78, 
    prev = 0x8ad8afa78
  }, 
  fn = 0x2091b0 <multi_cpu_stop>, 
  arg = 0x8ad8afac8, 
  done = 0x8ad8afaf0
}

crash> struct cpu_stop_work 0x8ad8afaa0
struct cpu_stop_work {
  list = {
    next = 0x0, <---- NULL indicates it was never enqueued
    prev = 0x0
  }, 
  fn = 0x2091b0 <multi_cpu_stop>, 
  arg = 0x8ad8afac8, 
  done = 0x8ad8afaf0
}

The corresponding struct cpu_stop_done below indicates that at least for
one of them cpu_stop_signal_done() was called (nr_todo == 1). So the idea
is still that this happened when cpu_stop_queue_work() was being called,
but the corresponding stopper was not enabled.

crash> struct -x cpu_stop_done 00000008ad8afaf0
struct cpu_stop_done {
  nr_todo = {
    counter = 0x1
  },
  executed = 0x0,
  ret = 0x0,
  completion = {
    done = 0x0,
    wait = {
      lock = {
        {
          rlock = {
            raw_lock = {
              lock = 0x0
            },
            break_lock = 0x0,
            magic = 0xdead4ead,
            owner_cpu = 0xffffffff,
            owner = 0xffffffffffffffff,
            dep_map = {
              key = 0x1e901e0 <__key.5629>,
              class_cache = {0x188fec0 <lock_classes+298096>, 0x0},
              name = 0xb40d0c "&x->wait",
              cpu = 0xb,
              ip = 0x94e5b2
            }
          },
          {
            __padding = "\000\000\000\000\000\000\000\000 ޭN\255\377\377\377\377\377\377\377\377\377\377\377\377",
            dep_map = {
              key = 0x1e901e0 <__key.5629>,
              class_cache = {0x188fec0 <lock_classes+298096>, 0x0},
              name = 0xb40d0c "&x->wait",
              cpu = 0xb,
              ip = 0x94e5b2
            }
          }
        }
      },
      task_list = {
        next = 0x8ad8afa20,
        prev = 0x8ad8afa20
      }
    }
  }
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/