lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Tue, 10 Aug 2021 22:32:59 +0100 From: Pavel Begunkov <asml.silence@...il.com> To: Nadav Amit <nadav.amit@...il.com>, Olivier Langlois <olivier@...llion01.com> Cc: Jens Axboe <axboe@...nel.dk>, io-uring@...r.kernel.org, Linux Kernel Mailing List <linux-kernel@...r.kernel.org> Subject: Re: [PATCH 1/2] io_uring: clear TIF_NOTIFY_SIGNAL when running task work On 8/10/21 9:28 AM, Nadav Amit wrote: >> On Aug 9, 2021, at 2:48 PM, Olivier Langlois <olivier@...llion01.com> wrote: >> On Sat, 2021-08-07 at 17:13 -0700, Nadav Amit wrote: >>> From: Nadav Amit <namit@...are.com> >>> >>> When using SQPOLL, the submission queue polling thread calls >>> task_work_run() to run queued work. However, when work is added with >>> TWA_SIGNAL - as done by io_uring itself - the TIF_NOTIFY_SIGNAL remains >>> set afterwards and is never cleared. >>> >>> Consequently, when the submission queue polling thread checks whether >>> signal_pending(), it may always find a pending signal, if >>> task_work_add() was ever called before. >>> >>> The impact of this bug might be different on different kernel versions. >>> It appears that on 5.14 it would only cause unnecessary calculation and >>> prevent the polling thread from sleeping. On 5.13, where the bug was >>> found, it stops the polling thread from finding newly submitted work. >>> >>> Instead of task_work_run(), use tracehook_notify_signal() that clears >>> TIF_NOTIFY_SIGNAL. Test for TIF_NOTIFY_SIGNAL in addition to >>> current->task_works to avoid a race in which task_works is cleared but >>> the TIF_NOTIFY_SIGNAL is set. >> >> thx a lot for this patch! >> >> This explains what I am seeing here: >> https://lore.kernel.org/io-uring/4d93d0600e4a9590a48d320c5a7dd4c54d66f095.camel@trillion01.com/ >> >> I was under the impression that task_work_run() was clearing >> TIF_NOTIFY_SIGNAL. >> >> your patch made me realize that it does not… > > Happy it could help. > > Unfortunately, there seems to be yet another issue (unless my code > somehow caused it). It seems that when SQPOLL is used, there are cases > in which we get stuck in io_uring_cancel_sqpoll() when tctx_inflight() > never goes down to zero. > > Debugging... (while also trying to make some progress with my code) It's most likely because a request has been lost (mis-refcounted). Let us know if you need any help. Would be great to solve it for 5.14. quick tips: 1) if not already, try out Jens' 5.14 branch git://git.kernel.dk/linux-block io_uring-5.14 2) try to characterise the io_uring use pattern. Poll requests? Read/write requests? Send/recv? Filesystem vs bdev vs sockets? If easily reproducible, you can match io_alloc_req() with it getting into io_dismantle_req(); -- Pavel Begunkov
Powered by blists - more mailing lists