[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180501101845.GE12217@hirez.programming.kicks-ass.net>
Date: Tue, 1 May 2018 12:18:45 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: "Kohli, Gaurav" <gkohli@...eaurora.org>
Cc: tglx@...utronix.de, mpe@...erman.id.au, mingo@...nel.org,
bigeasy@...utronix.de, linux-kernel@...r.kernel.org,
linux-arm-msm@...r.kernel.org,
Neeraj Upadhyay <neeraju@...eaurora.org>,
Will Deacon <will.deacon@....com>,
Oleg Nesterov <oleg@...hat.com>
Subject: Re: [PATCH v1] kthread/smpboot: Serialize kthread parking against
wakeup
On Tue, May 01, 2018 at 01:20:26PM +0530, Kohli, Gaurav wrote:
> But In our older case, where we have seen failure below is the wake up path
> and ftraces, Wakeup occured and completed before schedule call only.
>
> So final state of CPUHP is running not parked. I have also pasted debug
> ftraces that we got during issue reproduction.
>
> Here wakeup for cpuhp is below:
>
> takedown_cpu-> kthread_park-> wake_up_process
>
>
> 39,034,311,742,395 apps (10240) Trace Printk cpuhp/0 (16) [000]
> 39015.625000: <debug> __kthread_parkme state=512 task=ffffffcc7458e680
> flags: 0x5 -> state 5 -> state is parked inside parkme function
>
> 39,034,311,846,510 apps (10240) Trace Printk cpuhp/0 (16) [000]
> 39015.625000: <debug> before schedule __kthread_parkme state=0
> task=ffffffcc7458e680 flags: 0xd -> just before schedule call, state is
> running
>
> tatic void __kthread_parkme(struct kthread *self)
>
> {
>
> __set_current_state(TASK_PARKED);
>
> while (test_bit(KTHREAD_SHOULD_PARK, &self->flags)) {
>
> if (!test_and_set_bit(KTHREAD_IS_PARKED, &self->flags))
>
> complete(&self->parked);
>
> schedule();
>
> __set_current_state(TASK_PARKED);
>
> }
>
> clear_bit(KTHREAD_IS_PARKED, &self->flags);
>
> __set_current_state(TASK_RUNNING);
>
> }
>
> So my point is here also, if it is reschedule then it can set TASK_PARKED,
> but it seems after takedown_cpu call this thread never get a chance to run,
> So final state is TASK_RUNNING.
>
> In our current fix also can't we observe same scenario where final state is
> TASK_RUNNING.
I'm not sure I understand your concern. Loosing the TASK_PARKED store
with the above code is obviously bad. But with the loop as proposed I
don't see a problem.
takedown_cpu() can proceed beyond smpboot_park_threads() and kill the
CPU before any of the threads are parked -- per having the complete()
before hitting schedule().
And, afaict, that is harmless. When we go offline, sched_cpu_dying() ->
migrate_tasks() will migrate any still runnable threads off the cpu.
But because at this point the thread must be in the PARKED wait-loop, it
will hit schedule() and go to sleep eventually.
Also note that kthread_unpark() does __kthread_bind() to rebind the
threads.
Aaaah... I think I've spotted a problem there. We clear SHOULD_PARK
before we rebind, so if the thread lost the first PARKED store,
does the completion, gets migrated, cycles through the loop and now
observes !SHOULD_PARK and bails the wait-loop, then __kthread_bind()
will forever wait.
Is that what you had in mind?
Powered by blists - more mailing lists