lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <32a5ce10-bdbf-57fe-4318-ce53ad47f161@kernel.dk>
Date:   Wed, 28 Oct 2020 07:36:32 -0600
From:   Jens Axboe <axboe@...nel.dk>
To:     "Zhang, Qiang" <Qiang.Zhang@...driver.com>
Cc:     "io-uring@...r.kernel.org" <io-uring@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: 回复: [PATCH] io-wq: set task TASK_INTERRUPTIBLE state before schedule_timeout

On 10/27/20 8:47 PM, Zhang, Qiang wrote:
> 
> 
> ________________________________________
> 发件人: Jens Axboe <axboe@...nel.dk>
> 发送时间: 2020年10月27日 21:35
> 收件人: Zhang, Qiang
> 抄送: io-uring@...r.kernel.org; linux-kernel@...r.kernel.org
> 主题: Re: [PATCH] io-wq: set task TASK_INTERRUPTIBLE state before schedule_timeout
> 
> On 10/26/20 9:09 PM, qiang.zhang@...driver.com wrote:
>> From: Zqiang <qiang.zhang@...driver.com>
>>
>> In 'io_wqe_worker' thread, if the work which in 'wqe->work_list' be
>> finished, the 'wqe->work_list' is empty, and after that the
>> '__io_worker_idle' func return false, the task state is TASK_RUNNING,
>> need to be set TASK_INTERRUPTIBLE before call schedule_timeout func.
>>
>> I don't think that's safe - what if someone added work right before you
>> call schedule_timeout_interruptible? Something ala:
>>
>>
>> io_wq_enqueue()
>>                        set_current_state(TASK_INTERRUPTIBLE();
>>                        schedule_timeout(WORKER_IDLE_TIMEOUT);
>>
>> then we'll have work added and the task state set to running, but the
>> worker itself just sets us to non-running and will hence wait
>> WORKER_IDLE_TIMEOUT before the work is processed.
>>
>> The current situation will do one extra loop for this case, as the
>> schedule_timeout() just ends up being a nop and we go around again
> 
> although the worker task state is running,  due to the call
> schedule_timeout, the current worker still possible to be switched
> out. if set current worker task is no-running, the current worker be
> switched out, but the schedule will call io_wq_worker_sleeping func
> to wake up free worker task, if wqe->free_list is not empty.  

It'll only be swapped out for TASK_RUNNING if we should be running other
work, which would happen on next need-resched event anyway. And the miss
you're describing is an expensive one, as it entails creating a new
thread and switching to that. That's not a great way to handle a race.

So I'm a bit puzzled here - yes we'll do an extra loop and check for the
dropping of mm, but that's really minor. The solution is a _lot_ more
expensive for hitting the race of needing a new worker, but missing it
because you unconditionally set the task to non-running. On top of that,
it's also not the idiomatic way to wait for events, which is typically:

is event true, break if so
set_current_state(TASK_INTERRUPTIBLE);
					event comes in, task set runnable
check again, schedule
doesn't schedule, since we were set runnable

or variants thereof, using waitqueues.

So while I'm of course not opposed to fixing the io-wq loop so that we
don't do that last loop when going idle, a) it basically doesn't matter,
and b) the proposed solution is much worse. If there was a more elegant
solution without worse side effects, then we can discuss that.

-- 
Jens Axboe

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ