linux-kernel - Re: [PATCH] workqueue: Handle race between wake up and rebind

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <63ee9720-c502-0d20-099a-d1986723594b@codeaurora.org>
Date:   Thu, 18 Jan 2018 15:37:20 +0530
From:   Neeraj Upadhyay <neeraju@...eaurora.org>
To:     Lai Jiangshan <jiangshanlai@...il.com>
Cc:     Tejun Heo <tj@...nel.org>, LKML <linux-kernel@...r.kernel.org>,
        linux-arm-msm@...r.kernel.org, prsood@...eaurora.org,
        sramana@...eaurora.org
Subject: Re: [PATCH] workqueue: Handle race between wake up and rebind



On 01/18/2018 08:32 AM, Lai Jiangshan wrote:
> On Wed, Jan 17, 2018 at 4:08 AM, Neeraj Upadhyay <neeraju@...eaurora.org> wrote:
>>
>> On 01/16/2018 11:05 PM, Tejun Heo wrote:
>>> Hello, Neeraj.
>>>
>>> On Mon, Jan 15, 2018 at 02:08:12PM +0530, Neeraj Upadhyay wrote:
>>>> - kworker/0:0 gets chance to run on cpu1; while processing
>>>>     a work, it goes to sleep. However, it does not decrement
>>>>     pool->nr_running. This is because WORKER_REBOUND (NOT_
>>>>     RUNNING) flag was cleared, when worker entered worker_
>>> Do you mean that because REBOUND was set?
>>
>> Actually, I meant REBOUND was not set. Below is the sequence
>>
>> - cpu0 bounded pool is unbound.
>>
>> - kworker/0:0 is woken up on cpu1.
>>
>> - cpu0 pool is rebound
>>    REBOUND is set for kworker/0:0
>>
> Thanks for looking into the detail of workqueue...
>
> "REBOUND is set for kworker/0:0" means set_cpus_allowed_ptr(kworker/0:0)
> already successfull returned and kworker/0:0 is already moved to cpu0.
>
> It will not still run on cpu1 as the following steps you described.
>
> If there is something wrong with " set_cpus_allowed_ptr()"
> in this situation, could you please elaborate it.

Thanks Lai, I missed that; will debug from that perspective.
>
>> - kworker/0:0 starts running on cpu1
>>    worker_thread()
>>      // It clears REBOUND and sets nr_running =1 after below call
>>      worker_clr_flags(worker, WORKER_PREP | WORKER_REBOUND);
>>
>> - kworker/0:0 goes to sleep
>>    wq_worker_sleeping()
>>      // Below condition is not true, as all NOT_RUNNING
>>      // flags were cleared in worker_thread()
>>      if (worker->flags & WORKER_NOT_RUNNING)
>>      // Below is true, as worker is running on cpu1
>>      if (WARN_ON_ONCE(pool->cpu != raw_smp_processor_id()))
>>        return NULL;
>>      // Below is not reached and nr_running stays 1
>>      if (atomic_dec_and_test(&pool->nr_running) &&
>>
>> - kworker/0:0 wakes up again, this time on cpu0, as worker->task
>>    cpus_allowed was set to cpu0, in rebind_workers.
>>    wq_worker_waking_up()
>>      if (!(worker->flags & WORKER_NOT_RUNNING)) {
>>          // Increments pool->nr_running to 2
>>          atomic_inc(&worker->pool->nr_running);
>>
>>>>     thread().
>>>>
>>>>     Worker 0 runs on cpu1
>>>>       worker_thread()
>>>>         process_one_work()
>>>>           wq_worker_sleeping()
>>>>             if (worker->flags & WORKER_NOT_RUNNING)
>>>>               return NULL;
>>>>             if (WARN_ON_ONCE(pool->cpu != raw_smp_processor_id()))
>>>>               <Does not decrement nr_running>
>>>>
>>>> - After this, when kworker/0:0 wakes up, this time on its
>>>>     bounded cpu cpu0, it increments pool->nr_running again.
>>>>     So, pool->nr_running becomes 2.
>>> Why is it suddenly 2?  Who made it one on the account of the kworker?
>> As shown in above comment, it became 1 in
>> worker_clr_flags(worker, WORKER_PREP | WORKER_REBOUND);
>>>
>>> Do you see this happening?  Or better, is there a (semi) reliable
>>> repro for this issue?
>> Yes, this was reported in our long run testing with random hotplug.
>> Sorry, don't have a quick reproducer for it. Issue is reported in few
>> days of testing.
>>>
>>> Thanks.
>>>
>> --
>> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
>> member of the Code Aurora Forum, hosted by The Linux Foundation
>>

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation