[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0cfbfcf6-08f5-4d1b-82c4-729db9198896@nvidia.com>
Date: Thu, 21 Nov 2024 00:00:20 +0000
From: Chaitanya Kulkarni <chaitanyak@...dia.com>
To: Saeed Mirzamohammadi <saeed.mirzamohammadi@...cle.com>
CC: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-nvme@...ts.infradead.org" <linux-nvme@...ts.infradead.org>, Ramanan
Govindarajan <ramanan.govindarajan@...cle.com>, Sagi Grimberg
<sagi@...mberg.me>, Paul Webb <paul.x.webb@...cle.com>, Christoph Hellwig
<hch@....de>, Keith Busch <kbusch@...nel.org>, "axboe@...nel.dk"
<axboe@...nel.dk>
Subject: Re: [bug-report] 5-9% FIO randomwrite ext4 perf regression on 6.12.y
kernel
On 11/20/24 13:35, Saeed Mirzamohammadi wrote:
> Hi,
>
> I’m reporting a performance regression of up to 9-10% with FIO randomwrite benchmark on ext4 comparing 6.12.0-rc2 kernel and v5.15.161. Also, standard deviation after this change grows up to 5-6%.
>
> Bisect root cause commit
> ===================
> - commit 63dfa1004322 ("nvme: move NVME_QUIRK_DEALLOCATE_ZEROES out of nvme_config_discard”)
>
>
> Test details
> =========
> - readwrite=randwrite bs=4k size=1G ioengine=libaio iodepth=16 direct=1 time_based=1 ramp_time=180 runtime=1800 randrepeat=1 gtod_reduce=1
> - Test is on ext4 filesystem
> - System has 4 NVMe disks
>
Thanks a lot for the report, to narrow down this problem can you
please :-
1. Run the same test on the raw nvme device /dev/nvme0n1 that you
have used for this benchmark ?
2. Run the same test on the XFS formatted nvme device instead of ext4 ?
This way we will know if there is an issue only with the ext4 or
with other file systems are suffering from this problem too or
it is below the file system layer such as block layer and nvme pci driver ?
It will also help if you can repeat these numbers for io_uring fio io_engine
to narrow down this problem to know if the issue is ioengine specific.
Looking at the commit [1], it only sets the max value to write zeroes
sectors
if NVME_QUIRK_DEALLOCATE_ZEROES is set, else uses the controller max
write zeroes value.
So not sure how this commit can slow things down unless there is change in
behavior of the write-zeores instead of offloading (REQ_OP_WRITE_ZEROES)
it's now falling back to REQ_OP_WRITE with ZERO PAGE when called from
ext4 sb_issue_zeroout :-
fs/ext4/ialloc.c ext4_init_inode_table sb_issue_zeroout()
fs/ext4/inode.c ext4_issue_zeroout sb_issue_zeroout()
fs/ext4/resize.c setup_new_flex_group_blocks sb_issue_zeroout()
fs/ext4/resize.c setup_new_flex_group_blocks sb_issue_zeroout()
-ck
From 63dfa1004322d596417f23da43cdc43cf6298c71 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch@....de>
Date: Mon, 4 Mar 2024 07:04:46 -0700
Subject: [PATCH] nvme: move NVME_QUIRK_DEALLOCATE_ZEROES out of
nvme_config_discard
Move the handling of the NVME_QUIRK_DEALLOCATE_ZEROES quirk out of
nvme_config_discard so that it is combined with the normal write_zeroes
limit handling.
Signed-off-by: Christoph Hellwig <hch@....de>
Reviewed-by: Max Gurtovoy <mgurtovoy@...dia.com>
Signed-off-by: Keith Busch <kbusch@...nel.org>
---
drivers/nvme/host/core.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 6ae9aedf7bc2..a6c0b2f4cf79 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1816,9 +1816,6 @@ static void nvme_config_discard(struct nvme_ctrl
*ctrl, struct gendisk *disk,
else
blk_queue_max_discard_segments(queue, NVME_DSM_MAX_RANGES);
queue->limits.discard_granularity =
queue_logical_block_size(queue);
-
- if (ctrl->quirks & NVME_QUIRK_DEALLOCATE_ZEROES)
- blk_queue_max_write_zeroes_sectors(queue, UINT_MAX);
}
static bool nvme_ns_ids_equal(struct nvme_ns_ids *a, struct
nvme_ns_ids *b)
@@ -2029,8 +2026,12 @@ static void nvme_update_disk_info(struct
nvme_ctrl *ctrl, struct gendisk *disk,
set_capacity_and_notify(disk, capacity);
nvme_config_discard(ctrl, disk, head);
- blk_queue_max_write_zeroes_sectors(disk->queue,
- ctrl->max_zeroes_sectors);
+
+ if (ctrl->quirks & NVME_QUIRK_DEALLOCATE_ZEROES)
+ blk_queue_max_write_zeroes_sectors(disk->queue, UINT_MAX);
+ else
+ blk_queue_max_write_zeroes_sectors(disk->queue,
+ ctrl->max_zeroes_sectors);
}
static bool nvme_ns_is_readonly(struct nvme_ns *ns, struct
nvme_ns_info *info)
Powered by blists - more mailing lists