lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <A4DC14BA-74CA-41DB-BE08-D7B693C11AE0@gmail.com>
Date:   Tue, 10 Aug 2021 01:28:01 -0700
From:   Nadav Amit <nadav.amit@...il.com>
To:     Olivier Langlois <olivier@...llion01.com>
Cc:     Jens Axboe <axboe@...nel.dk>, io-uring@...r.kernel.org,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Pavel Begunkov <asml.silence@...il.com>
Subject: Re: [PATCH 1/2] io_uring: clear TIF_NOTIFY_SIGNAL when running task
 work


> On Aug 9, 2021, at 2:48 PM, Olivier Langlois <olivier@...llion01.com> wrote:
> 
> On Sat, 2021-08-07 at 17:13 -0700, Nadav Amit wrote:
>> From: Nadav Amit <namit@...are.com>
>> 
>> When using SQPOLL, the submission queue polling thread calls
>> task_work_run() to run queued work. However, when work is added with
>> TWA_SIGNAL - as done by io_uring itself - the TIF_NOTIFY_SIGNAL remains
>> set afterwards and is never cleared.
>> 
>> Consequently, when the submission queue polling thread checks whether
>> signal_pending(), it may always find a pending signal, if
>> task_work_add() was ever called before.
>> 
>> The impact of this bug might be different on different kernel versions.
>> It appears that on 5.14 it would only cause unnecessary calculation and
>> prevent the polling thread from sleeping. On 5.13, where the bug was
>> found, it stops the polling thread from finding newly submitted work.
>> 
>> Instead of task_work_run(), use tracehook_notify_signal() that clears
>> TIF_NOTIFY_SIGNAL. Test for TIF_NOTIFY_SIGNAL in addition to
>> current->task_works to avoid a race in which task_works is cleared but
>> the TIF_NOTIFY_SIGNAL is set.
> 
> thx a lot for this patch!
> 
> This explains what I am seeing here:
> https://lore.kernel.org/io-uring/4d93d0600e4a9590a48d320c5a7dd4c54d66f095.camel@trillion01.com/
> 
> I was under the impression that task_work_run() was clearing
> TIF_NOTIFY_SIGNAL.
> 
> your patch made me realize that it does not…

Happy it could help.

Unfortunately, there seems to be yet another issue (unless my code
somehow caused it). It seems that when SQPOLL is used, there are cases
in which we get stuck in io_uring_cancel_sqpoll() when tctx_inflight()
never goes down to zero.

Debugging... (while also trying to make some progress with my code)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ