[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20260117024413.484508-2-bojanalahithashri@gmail.com>
Date: Fri, 16 Jan 2026 21:44:07 -0500
From: Hithashree Bojanala <bojanalahithashri@...il.com>
To: linux-block@...r.kernel.org
Cc: bojanala hithashri <bojanalahithashri@...il.com>,
linux-scsi@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: [REGRESSION] fio 4k randread: ~1–3.5% IOPS regression on linux-next (6.19.0-rc1-next-20251219) vs RHEL9 5.14 on PERC H740P
From: bojanala hithashri <bojanalahithashri@...il.com>
Hello,
I am reporting a small but consistent block I/O performance regression
observed when running 4k random reads across queue depths on a hardware
RAID device.
The regression appears when comparing a RHEL9 downstream kernel against
a linux-next snapshot.
System / Hardware
-----------------
CPU:
Model: Intel Xeon Gold 6130 @ 2.10GHz
Architecture: x86_64
Sockets: 2
Cores per socket: 16
Threads per core: 2
NUMA nodes: 2
Memory:
Total: 187 GB
NUMA nodes: 2
Node 0: ~94 GB
Node 1: ~97 GB
Swap: 4 GB (unused during test)
Storage controller:
Dell PERC H740P (hardware RAID)
Block device:
/dev/sdh
lsblk output:
NAME MODEL SIZE ROTA TRAN SCHED
sdh PERC H740P Adp 1.6T 1 mq-deadline
Active scheduler:
/sys/block/sdh/queue/scheduler
none [mq-deadline] kyber bfq
Kernels Tested
--------------
Baseline (downstream):
5.14.0-427.13.1.el9_4.x86_64
Test (upstream integration tree):
6.19.0-rc1-next-20251219
Workload / Reproducer
---------------------
fio version: 3.35
Raw block device, direct I/O, libaio, single job, long runtime (300s)
Command used:
for depth in 1 2 4 8 16 32 64 128 256 512 1024 2048; do
fio --rw=randread \
--bs=4096 \
--name=randread-$depth \
--filename=/dev/sdh \
--ioengine=libaio \
--numjobs=1 --thread \
--norandommap \
--runtime=300 \
--direct=1 \
--iodepth=$depth \
--scramble_buffers=1 \
--offset=0 \
--size=100g
done
Observed Behavior
-----------------
Across all queue depths tested, the linux-next kernel shows:
- ~1–3.5% lower IOPS
- Corresponding bandwidth reduction
- ~1–3.6% higher average completion latency
- Slightly worse p99 / p99.9 latency
The throughput saturation point remains unchanged
(around iodepth ≈ 128), suggesting the regression is
related to service/dispatch efficiency rather than a
change in device limits.
Example Data Points
-------------------
- iodepth=32:
old: 554 IOPS → new: 535 IOPS (~-3.4%)
avg clat: 57.7 ms → 59.8 ms
- iodepth=64:
old: 608 IOPS → new: 588 IOPS (~-3.3%)
avg clat: 105 ms → 109 ms
- iodepth=128:
old: 648 IOPS → new: 640 IOPS (~-1.2%)
This behavior is consistent across multiple runs.
Notes
-----
I understand this comparison spans a downstream RHEL kernel
and a linux-next snapshot. I wanted to report this early
because the regression is consistent and may relate to recent
blk-mq or mq-deadline changes affecting rotational / hardware
RAID devices.
I am happy to:
- Re-test on a specific mainline release (e.g. v6.18 or v6.19-rc)
- Compare schedulers (mq-deadline vs none / bfq)
- Provide additional instrumentation (iostat, perf, bpf)
- Assist with bisection if a suspect window is identified
Please let me know how you would like me to proceed.
Thanks,
Hithashree
Powered by blists - more mailing lists