lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f76158fc-7dc2-4701-9a61-246656aa4a61@kernel.dk>
Date: Wed, 29 Jan 2025 10:45:43 -0700
From: Jens Axboe <axboe@...nel.dk>
To: Max Kellermann <max.kellermann@...os.com>
Cc: asml.silence@...il.com, io-uring@...r.kernel.org,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/8] Various io_uring micro-optimizations (reducing lock
 contention)

On 1/29/25 10:39 AM, Max Kellermann wrote:
> On Wed, Jan 29, 2025 at 6:19?PM Jens Axboe <axboe@...nel.dk> wrote:
>> The other patches look pretty straight forward to me. Only thing that
>> has me puzzled a bit is why you have so much io-wq activity with your
>> application, in general I'd expect 0 activity there. But Then I saw the
>> forced ASYNC flag, and it makes sense. In general, forcing that isn't a
>> great idea, but for a benchmark for io-wq it certainly makes sense.
> 
> I was experimenting with io_uring and wanted to see how much
> performance I can squeeze out of my web server running
> single-threaded. The overhead of io_uring_submit() grew very large,
> because the "send" operation would do a lot of synchronous work in the
> kernel. I tried SQPOLL but it was actually a big performance
> regression; this just shifted my CPU usage to epoll_wait(). Forcing
> ASYNC gave me large throughput improvements (moving the submission
> overhead to iowq), but then the iowq lock contention was the next
> limit, thus this patch series.
> 
> I'm still experimenting, and I will certainly revisit SQPOLL to learn
> more about why it didn't help and how to fix it.

Why are you combining it with epoll in the first place? It's a lot more
efficient to wait on a/multiple events in io_uring_enter() rather than
go back to a serialize one-event-per-notification by using epoll to wait
on completions on the io_uring side.

-- 
Jens Axboe

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ