[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cdd5d434-9c11-7c19-2895-0f7f3811eb11@codeaurora.org>
Date: Mon, 7 May 2018 16:39:28 +0530
From: "Kohli, Gaurav" <gkohli@...eaurora.org>
To: Peter Zijlstra <peterz@...radead.org>
Cc: tglx@...utronix.de, mpe@...erman.id.au, mingo@...nel.org,
bigeasy@...utronix.de, linux-kernel@...r.kernel.org,
linux-arm-msm@...r.kernel.org,
Neeraj Upadhyay <neeraju@...eaurora.org>,
Will Deacon <will.deacon@....com>,
Oleg Nesterov <oleg@...hat.com>
Subject: Re: [PATCH v1] kthread/smpboot: Serialize kthread parking against
wakeup
On 5/2/2018 3:43 PM, Kohli, Gaurav wrote:
>
>
> On 5/2/2018 1:50 PM, Peter Zijlstra wrote:
>> On Wed, May 02, 2018 at 10:45:52AM +0530, Kohli, Gaurav wrote:
>>> On 5/1/2018 6:49 PM, Peter Zijlstra wrote:
>>
>>>> - complete(&kthread->parked), which we can do inside schedule();
>>>> this
>>>> solves the problem because then kthread_park() will not return
>>>> early
>>>> and the task really is blocked.
>>>
>>> I think complete will not help, as problem is like below :
>>>
>>> Control Thread CPUHP thread
>>>
>>> cpuhp_thread_fun
>>> Wake control thread
>>> complete(&st->done);
>>>
>>> takedown_cpu
>>> kthread_park
>>> set_bit(KTHREAD_SHOULD_PARK
>>>
>>> Here cpuhp is looping,
>>> //success case
>>> Generally when issue is not
>>> coming
>>> it schedule out by below :
>>>
>>> ht->thread_should_run(td->cpu
>>> scheduler
>>> //failure case
>>> before schedule
>>> loop check
>>> (kthread_should_park()
>>> enter here as PARKED set
>>>
>>> wake_up_process(k)
>>
>> If k has TASK_PARKED, then wake_up_process() which uses TASK_NORMAL will
>> no-op, because:
>>
>> TASK_PARKED & TASK_NORMAL == 0
>>
>>> __kthread_parkme
>>> complete(&self->parked);
>>> SETS RUNNING
>>> schedule
>>
>> But suppose, you do get that store, and we get to schedule with
>> TASK_RUNNING, then schedule will no-op and we'll go around the loop and
>> not complete.
>>
>> See also:
>> lkml.kernel.org/r/20180430111744.GE4082@...ez.programming.kicks-ass.net
>>
>> Either TASK_RUNNING gets set before we do schedule() and we go around
>> again, re-set TASK_PARKED, resched the condition and re-call schedule(),
>> or we schedule() first and ttwu() will not issue the TASK_RUNNING store.
>>
>> In either case, we'll eventually hit schedule() with TASK_PARKED. Then,
>> and only then will the complete() happen.
>>
>>> wait_for_completion(&kthread->parked);
>>
>> The point is, we'll only ever complete ^ that completion when we've
>> scheduled out the task in TASK_PARKED state. If the task didn't get
>> parked, no completion.
>
> Thanks for the detailed explanation, yes in all cases unpark will
> observe parked state only.
>>
>>
>> And that is the reason I like this approach above the others. It
>> guarantees the task really is parked when we ask for it. We don't have
>> to deal with the task still running and getting migrated to another CPU
>> nonsense.
>>
>
HI Peter,
We have tested with new patch and still seeing same issue, in this dumps
we don't have debug traces, but seems there still exist race from code
review , Can you please check it once:
Controller Thread CPUHP Thread
takedown_cpu
kthread_park
kthread_parkme
Set KTHREAD_SHOULD_PARK
smpboot_thread_fn
set Task interruptible
wake_up_process
Kthread_parkme
SET TASK_PARKED
schedule
raw_spin_lock(&rq->lock)
context_switch
finish_lock_switch
Case TASK_PARKED
kthread_park_complete
SET TASK_INTERRUPTIBLE
And also seeing the same warning during unpark of cpuhp from controller:
if (!wait_task_inactive(p, state)) {
WARN_ON(1);
return;
}
325.065893] [<ffffff8920ed0200>] kthread_unpark+0x80/0xd8
[ 325.065902] [<ffffff8920eab754>] bringup_cpu+0xa0/0x12c
[ 325.065910] [<ffffff8920eaae90>] cpuhp_invoke_callback+0xb4/0x5c8
[ 325.065917] [<ffffff8920eabd98>] cpuhp_up_callbacks+0x3c/0x154
[ 325.065924] [<ffffff8920ead220>] _cpu_up+0x134/0x208
[ 325.065931] [<ffffff8920ead45c>] do_cpu_up+0x168/0x1a0
[ 325.065938] [<ffffff8920ead4b8>] cpu_up+0x24/0x30
[ 325.065948] [<ffffff89215b1408>] cpu_subsys_online+0x20/0x2c
[ 325.065956] [<ffffff89215aac64>] device_online+0x70/0xb4
[ 325.065962] [<ffffff89215aad78>] online_store+0xd0/0xdc
[ 325.065971] [<ffffff89215a7424>] dev_attr_store+0x40/0x54
[ 325.065982] [<ffffff89210d8a98>] sysfs_kf_write+0x5c/0x74
[ 325.065988] [<ffffff89210d7b9c>] kernfs_fop_write+0xcc/0x1ec
[ 325.065999] [<ffffff8921049288>] vfs_write+0xb4/0x1d0
[ 325.066006] [<ffffff892104a858>] SyS_write+0x60/0xc0
[ 325.066014] [<ffffff8920e83770>] el0_svc_naked+0x24/0x28
And after this same crash occured:
[ 325.521307] [<ffffff8920ed4aac>] smpboot_thread_fn+0x26c/0x2c8
[ 325.527295] [<ffffff8920ecfb24>] kthread+0xf4/0x108
I will put more debug ftraces to check what is going on exactly.
Regards
Gaurav
--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center,
Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.
Powered by blists - more mailing lists