lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0cfbfcf6-08f5-4d1b-82c4-729db9198896@nvidia.com>
Date: Thu, 21 Nov 2024 00:00:20 +0000
From: Chaitanya Kulkarni <chaitanyak@...dia.com>
To: Saeed Mirzamohammadi <saeed.mirzamohammadi@...cle.com>
CC: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-nvme@...ts.infradead.org" <linux-nvme@...ts.infradead.org>, Ramanan
 Govindarajan <ramanan.govindarajan@...cle.com>, Sagi Grimberg
	<sagi@...mberg.me>, Paul Webb <paul.x.webb@...cle.com>, Christoph Hellwig
	<hch@....de>, Keith Busch <kbusch@...nel.org>, "axboe@...nel.dk"
	<axboe@...nel.dk>
Subject: Re: [bug-report] 5-9% FIO randomwrite ext4 perf regression on 6.12.y
 kernel

On 11/20/24 13:35, Saeed Mirzamohammadi wrote:
> Hi,
>
> I’m reporting a performance regression of up to 9-10% with FIO randomwrite benchmark on ext4 comparing 6.12.0-rc2 kernel and v5.15.161. Also, standard deviation after this change grows up to 5-6%.
>
> Bisect root cause commit
> ===================
> - commit 63dfa1004322 ("nvme: move NVME_QUIRK_DEALLOCATE_ZEROES out of nvme_config_discard”)
>
>
> Test details
> =========
> - readwrite=randwrite bs=4k size=1G ioengine=libaio iodepth=16 direct=1 time_based=1 ramp_time=180 runtime=1800 randrepeat=1 gtod_reduce=1
> - Test is on ext4 filesystem
> - System has 4 NVMe disks
>

Thanks a lot for the report, to narrow down this problem can you
please :-

1. Run the same test on the raw nvme device /dev/nvme0n1 that you
    have used for this benchmark ?
2. Run the same test on the  XFS formatted nvme device instead of ext4 ?

This way we will know if there is an issue only with the ext4 or
with other file systems are suffering from this problem too or
it is below the file system layer such as block layer and nvme pci driver ?

It will also help if you can repeat these numbers for io_uring fio io_engine
to narrow down this problem to know if the issue is ioengine specific.

Looking at the commit [1], it only sets the max value to write zeroes 
sectors
if NVME_QUIRK_DEALLOCATE_ZEROES is set, else uses the controller max
write zeroes value.

So not sure how this commit can slow things down unless there is change in
behavior of the write-zeores instead of offloading (REQ_OP_WRITE_ZEROES)
it's now falling back to REQ_OP_WRITE with ZERO PAGE when called from
ext4 sb_issue_zeroout :-

fs/ext4/ialloc.c ext4_init_inode_table        sb_issue_zeroout()
fs/ext4/inode.c  ext4_issue_zeroout           sb_issue_zeroout()
fs/ext4/resize.c setup_new_flex_group_blocks  sb_issue_zeroout()
fs/ext4/resize.c setup_new_flex_group_blocks  sb_issue_zeroout()

-ck

 From 63dfa1004322d596417f23da43cdc43cf6298c71 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch@....de>
Date: Mon, 4 Mar 2024 07:04:46 -0700
Subject: [PATCH] nvme: move NVME_QUIRK_DEALLOCATE_ZEROES out of
  nvme_config_discard

Move the handling of the NVME_QUIRK_DEALLOCATE_ZEROES quirk out of
nvme_config_discard so that it is combined with the normal write_zeroes
limit handling.

Signed-off-by: Christoph Hellwig <hch@....de>
Reviewed-by: Max Gurtovoy <mgurtovoy@...dia.com>
Signed-off-by: Keith Busch <kbusch@...nel.org>
---
  drivers/nvme/host/core.c | 11 ++++++-----
  1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 6ae9aedf7bc2..a6c0b2f4cf79 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1816,9 +1816,6 @@ static void nvme_config_discard(struct nvme_ctrl 
*ctrl, struct gendisk *disk,
         else
                 blk_queue_max_discard_segments(queue, NVME_DSM_MAX_RANGES);
         queue->limits.discard_granularity = 
queue_logical_block_size(queue);
-
-       if (ctrl->quirks & NVME_QUIRK_DEALLOCATE_ZEROES)
-               blk_queue_max_write_zeroes_sectors(queue, UINT_MAX);
  }

  static bool nvme_ns_ids_equal(struct nvme_ns_ids *a, struct 
nvme_ns_ids *b)
@@ -2029,8 +2026,12 @@ static void nvme_update_disk_info(struct 
nvme_ctrl *ctrl, struct gendisk *disk,
         set_capacity_and_notify(disk, capacity);

         nvme_config_discard(ctrl, disk, head);
-       blk_queue_max_write_zeroes_sectors(disk->queue,
- ctrl->max_zeroes_sectors);
+
+       if (ctrl->quirks & NVME_QUIRK_DEALLOCATE_ZEROES)
+               blk_queue_max_write_zeroes_sectors(disk->queue, UINT_MAX);
+ else
+               blk_queue_max_write_zeroes_sectors(disk->queue,
+                               ctrl->max_zeroes_sectors);
  }

  static bool nvme_ns_is_readonly(struct nvme_ns *ns, struct 
nvme_ns_info *info)


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ