linux-kernel - Re: [PATCH v1] kthread/smpboot: Serialize kthread parking against wakeup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cdd5d434-9c11-7c19-2895-0f7f3811eb11@codeaurora.org>
Date:   Mon, 7 May 2018 16:39:28 +0530
From:   "Kohli, Gaurav" <gkohli@...eaurora.org>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     tglx@...utronix.de, mpe@...erman.id.au, mingo@...nel.org,
        bigeasy@...utronix.de, linux-kernel@...r.kernel.org,
        linux-arm-msm@...r.kernel.org,
        Neeraj Upadhyay <neeraju@...eaurora.org>,
        Will Deacon <will.deacon@....com>,
        Oleg Nesterov <oleg@...hat.com>
Subject: Re: [PATCH v1] kthread/smpboot: Serialize kthread parking against
 wakeup



On 5/2/2018 3:43 PM, Kohli, Gaurav wrote:
> 
> 
> On 5/2/2018 1:50 PM, Peter Zijlstra wrote:
>> On Wed, May 02, 2018 at 10:45:52AM +0530, Kohli, Gaurav wrote:
>>> On 5/1/2018 6:49 PM, Peter Zijlstra wrote:
>>
>>>>    - complete(&kthread->parked), which we can do inside schedule(); 
>>>> this
>>>>      solves the problem because then kthread_park() will not return 
>>>> early
>>>>      and the task really is blocked.
>>>
>>> I think complete will not help, as problem is like below :
>>>
>>> Control Thread                                CPUHP thread
>>>
>>>                           cpuhp_thread_fun
>>>                           Wake control thread
>>>                           complete(&st->done);
>>>
>>> takedown_cpu
>>> kthread_park
>>> set_bit(KTHREAD_SHOULD_PARK
>>>
>>>                          Here cpuhp is looping,
>>>                     //success case
>>>                          Generally when issue is not
>>>                          coming
>>>                          it schedule out by below :
>>>                                             
>>> ht->thread_should_run(td->cpu
>>>                           scheduler
>>>                     //failure case
>>>                     before schedule
>>>                     loop check
>>>                     (kthread_should_park()
>>>                          enter here as PARKED set
>>>
>>> wake_up_process(k)
>>
>> If k has TASK_PARKED, then wake_up_process() which uses TASK_NORMAL will
>> no-op, because:
>>
>>     TASK_PARKED & TASK_NORMAL == 0
>>
>>>                     __kthread_parkme
>>>                      complete(&self->parked);
>>> SETS RUNNING
>>>                                  schedule
>>
>> But suppose, you do get that store, and we get to schedule with
>> TASK_RUNNING, then schedule will no-op and we'll go around the loop and
>> not complete.
>>
>> See also: 
>> lkml.kernel.org/r/20180430111744.GE4082@...ez.programming.kicks-ass.net
>>
>> Either TASK_RUNNING gets set before we do schedule() and we go around
>> again, re-set TASK_PARKED, resched the condition and re-call schedule(),
>> or we schedule() first and ttwu() will not issue the TASK_RUNNING store.
>>
>> In either case, we'll eventually hit schedule() with TASK_PARKED. Then,
>> and only then will the complete() happen.
>>
>>> wait_for_completion(&kthread->parked);
>>
>> The point is, we'll only ever complete ^ that completion when we've
>> scheduled out the task in TASK_PARKED state. If the task didn't get
>> parked, no completion.
> 
> Thanks for the detailed explanation, yes in all cases unpark will 
> observe parked state only.
>>
>>
>> And that is the reason I like this approach above the others. It
>> guarantees the task really is parked when we ask for it. We don't have
>> to deal with the task still running and getting migrated to another CPU
>> nonsense.
>>
> 

HI Peter,

We have tested with new patch and still seeing same issue, in this dumps 
we don't have debug traces, but seems there still exist race from code 
review , Can you please check it once:

Controller Thread                                   CPUHP Thread
takedown_cpu
kthread_park
kthread_parkme
Set KTHREAD_SHOULD_PARK
						smpboot_thread_fn
						set Task interruptible
						
						
wake_up_process

                                                 Kthread_parkme
                                                 SET TASK_PARKED
                                                 schedule
			                        raw_spin_lock(&rq->lock)
					
                                                 context_switch

						finish_lock_switch



                                                Case TASK_PARKED
                                                kthread_park_complete


SET TASK_INTERRUPTIBLE

	
And also seeing the same warning during unpark of cpuhp from controller:
  if (!wait_task_inactive(p, state)) {
                 WARN_ON(1);
                 return;
         }
325.065893] [<ffffff8920ed0200>] kthread_unpark+0x80/0xd8
[  325.065902] [<ffffff8920eab754>] bringup_cpu+0xa0/0x12c
[  325.065910] [<ffffff8920eaae90>] cpuhp_invoke_callback+0xb4/0x5c8
[  325.065917] [<ffffff8920eabd98>] cpuhp_up_callbacks+0x3c/0x154
[  325.065924] [<ffffff8920ead220>] _cpu_up+0x134/0x208
[  325.065931] [<ffffff8920ead45c>] do_cpu_up+0x168/0x1a0
[  325.065938] [<ffffff8920ead4b8>] cpu_up+0x24/0x30
[  325.065948] [<ffffff89215b1408>] cpu_subsys_online+0x20/0x2c
[  325.065956] [<ffffff89215aac64>] device_online+0x70/0xb4
[  325.065962] [<ffffff89215aad78>] online_store+0xd0/0xdc
[  325.065971] [<ffffff89215a7424>] dev_attr_store+0x40/0x54
[  325.065982] [<ffffff89210d8a98>] sysfs_kf_write+0x5c/0x74
[  325.065988] [<ffffff89210d7b9c>] kernfs_fop_write+0xcc/0x1ec
[  325.065999] [<ffffff8921049288>] vfs_write+0xb4/0x1d0
[  325.066006] [<ffffff892104a858>] SyS_write+0x60/0xc0
[  325.066014] [<ffffff8920e83770>] el0_svc_naked+0x24/0x28


And after this same crash occured:
[  325.521307] [<ffffff8920ed4aac>] smpboot_thread_fn+0x26c/0x2c8
[  325.527295] [<ffffff8920ecfb24>] kthread+0xf4/0x108

I will put more debug ftraces to check what is going on exactly.
			
Regards
Gaurav

							


-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, 
Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.