lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e4dca167-ffb1-944f-4dc0-93fa624ebce6@codeaurora.org>
Date:   Wed, 17 Jan 2018 01:38:06 +0530
From:   Neeraj Upadhyay <neeraju@...eaurora.org>
To:     Tejun Heo <tj@...nel.org>
Cc:     jiangshanlai@...il.com, linux-kernel@...r.kernel.org,
        linux-arm-msm@...r.kernel.org, prsood@...eaurora.org,
        sramana@...eaurora.org
Subject: Re: [PATCH] workqueue: Handle race between wake up and rebind



On 01/16/2018 11:05 PM, Tejun Heo wrote:
> Hello, Neeraj.
>
> On Mon, Jan 15, 2018 at 02:08:12PM +0530, Neeraj Upadhyay wrote:
>> - kworker/0:0 gets chance to run on cpu1; while processing
>>    a work, it goes to sleep. However, it does not decrement
>>    pool->nr_running. This is because WORKER_REBOUND (NOT_
>>    RUNNING) flag was cleared, when worker entered worker_
> Do you mean that because REBOUND was set?

Actually, I meant REBOUND was not set. Below is the sequence

- cpu0 bounded pool is unbound.

- kworker/0:0 is woken up on cpu1.

- cpu0 pool is rebound
   REBOUND is set for kworker/0:0

- kworker/0:0 starts running on cpu1
   worker_thread()
     // It clears REBOUND and sets nr_running =1 after below call
     worker_clr_flags(worker, WORKER_PREP | WORKER_REBOUND);

- kworker/0:0 goes to sleep
   wq_worker_sleeping()
     // Below condition is not true, as all NOT_RUNNING
     // flags were cleared in worker_thread()
     if (worker->flags & WORKER_NOT_RUNNING)
     // Below is true, as worker is running on cpu1
     if (WARN_ON_ONCE(pool->cpu != raw_smp_processor_id()))
       return NULL;
     // Below is not reached and nr_running stays 1
     if (atomic_dec_and_test(&pool->nr_running) &&

- kworker/0:0 wakes up again, this time on cpu0, as worker->task
   cpus_allowed was set to cpu0, in rebind_workers.
   wq_worker_waking_up()
     if (!(worker->flags & WORKER_NOT_RUNNING)) {
         // Increments pool->nr_running to 2
         atomic_inc(&worker->pool->nr_running);

>
>>    thread().
>>
>>    Worker 0 runs on cpu1
>>      worker_thread()
>>        process_one_work()
>>          wq_worker_sleeping()
>>            if (worker->flags & WORKER_NOT_RUNNING)
>>              return NULL;
>>            if (WARN_ON_ONCE(pool->cpu != raw_smp_processor_id()))
>>              <Does not decrement nr_running>
>>
>> - After this, when kworker/0:0 wakes up, this time on its
>>    bounded cpu cpu0, it increments pool->nr_running again.
>>    So, pool->nr_running becomes 2.
> Why is it suddenly 2?  Who made it one on the account of the kworker?
As shown in above comment, it became 1 in
worker_clr_flags(worker, WORKER_PREP | WORKER_REBOUND);
>
> Do you see this happening?  Or better, is there a (semi) reliable
> repro for this issue?
Yes, this was reported in our long run testing with random hotplug.
Sorry, don't have a quick reproducer for it. Issue is reported in few
days of testing.
>
> Thanks.
>

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ