[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20241024023805.1082769-1-xue01.he@samsung.com>
Date: Thu, 24 Oct 2024 10:38:05 +0800
From: hexue <xue01.he@...sung.com>
To: asml.silence@...il.com, axboe@...nel.dk
Cc: io-uring@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: Re: [PATCH v8] io_uring: releasing CPU resources when polling
On 9/25/2024 12:12, Pavel Begunkov wrote:
>I don't have a strong opinion on the feature, but the open question
>we should get some decision on is whether it's really well applicable to
>a good enough set of apps / workloads, if it'll even be useful in the
>future and/or for other vendors, and if the merit outweighs extra
>8 bytes + 1 flag per io_kiocb and the overhead of 1-2 static key'able
>checks in hot paths.
IMHO, releasing some of the CPU resources during the polling
process may be appropriate for some performance bottlenecks
due to CPU resource constraints, such as some database
applications, in addition to completing IO operations, CPU
also needs to peocess data, like compression and decompression.
In a high-concurrency state, not only polling takes up a lot of
CPU time, but also operations like calculation and processing
also need to compete for CPU time. In this case, the performance
of the application may be difficult to improve.
The MultiRead interface of Rocksdb has been adapted to io_uring,
I used db_bench to construct a situation with high CPU pressure
and compared the performance. The test configuration is as follows,
-------------------------------------------------------------------
CPU Model Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz
CPU Cores 8
Memory 16G
SSD Samsung PM9A3
-------------------------------------------------------------------
Test case:
./db_bench --benchmarks=multireadrandom,stats
--duration=60
--threads=4/8/16
--use_direct_reads=true
--db=/mnt/rocks/test_db
--wal_dir=/mnt/rocks/test_db
--key_size=4
--value_size=4096
-cache_size=0
-use_existing_db=1
-batch_size=256
-multiread_batched=true
-multiread_stride=0
------------------------------------------------------
Test result:
National Optimization
threads ops/sec ops/sec CPU Utilization
16 139300 189075 100%*8
8 138639 133191 90%*8
4 71475 68361 90%*8
------------------------------------------------------
When the number of threads exceeds the number of CPU cores,the
database throughput does not increase significantly. However,
hybrid polling can releasing some CPU resources during the polling
process, so that part of the CPU time can be used for frequent
data processing and other operations, which speeds up the reading
process, thereby improving throughput and optimizaing database
performance.I tried different compression strategies and got
results similar to the above table.(~30% throughput improvement)
As more database applications adapt to the io_uring engine, I think
the application of hybrid poll may have potential in some scenarios.
--
Xue
Powered by blists - more mailing lists