lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20241024023805.1082769-1-xue01.he@samsung.com>
Date: Thu, 24 Oct 2024 10:38:05 +0800
From: hexue <xue01.he@...sung.com>
To: asml.silence@...il.com, axboe@...nel.dk
Cc: io-uring@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: Re: [PATCH v8] io_uring: releasing CPU resources when polling

On 9/25/2024 12:12, Pavel Begunkov wrote:
>I don't have a strong opinion on the feature, but the open question
>we should get some decision on is whether it's really well applicable to
>a good enough set of apps / workloads, if it'll even be useful in the
>future and/or for other vendors, and if the merit outweighs extra
>8 bytes + 1 flag per io_kiocb and the overhead of 1-2 static key'able
>checks in hot paths.

IMHO, releasing some of the CPU resources during the polling
process may be appropriate for some performance bottlenecks
due to CPU resource constraints, such as some database
applications, in addition to completing IO operations, CPU
also needs to peocess data, like compression and decompression.
In a high-concurrency state, not only polling takes up a lot of
CPU time, but also operations like calculation and processing
also need to compete for CPU time. In this case, the performance
of the application may be difficult to improve.

The MultiRead interface of Rocksdb has been adapted to io_uring,
I used db_bench to construct a situation with high CPU pressure
and compared the performance. The test configuration is as follows,

-------------------------------------------------------------------
CPU Model 	Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz
CPU Cores	8
Memory		16G
SSD			Samsung PM9A3
-------------------------------------------------------------------

Test case:
./db_bench --benchmarks=multireadrandom,stats
--duration=60
--threads=4/8/16
--use_direct_reads=true
--db=/mnt/rocks/test_db
--wal_dir=/mnt/rocks/test_db
--key_size=4
--value_size=4096
-cache_size=0
-use_existing_db=1
-batch_size=256
-multiread_batched=true
-multiread_stride=0
------------------------------------------------------
Test result:
			National	Optimization
threads		ops/sec		ops/sec		CPU Utilization
16			139300		189075		100%*8
8			138639		133191		90%*8
4			71475		68361		90%*8
------------------------------------------------------

When the number of threads exceeds the number of CPU cores,the
database throughput does not increase significantly. However,
hybrid polling can releasing some CPU resources during the polling
process, so that part of the CPU time can be used for frequent
data processing and other operations, which speeds up the reading
process, thereby improving throughput and optimizaing database
performance.I tried different compression strategies and got
results similar to the above table.(~30% throughput improvement)

As more database applications adapt to the io_uring engine, I think
the application of hybrid poll may have potential in some scenarios.
--
Xue

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ