[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f2d6dfd6-1234-2545-7955-07db078faa54@nvidia.com>
Date: Tue, 8 Nov 2022 23:28:11 +0000
From: Chaitanya Kulkarni <chaitanyak@...dia.com>
To: Gabriel Krisman Bertazi <krisman@...e.de>
CC: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-block@...r.kernel.org" <linux-block@...r.kernel.org>,
Hugh Dickins <hughd@...gle.com>,
Keith Busch <kbusch@...nel.org>,
"axboe@...nel.dk" <axboe@...nel.dk>,
Liu Song <liusong@...ux.alibaba.com>, Jan Kara <jack@...e.cz>
Subject: Re: [PATCH] sbitmap: Use single per-bitmap counting to wake up queued
tags
> For more interesting cases, where there is queueing, we need to take
> into account the cross-communication of the atomic operations. I've
> been benchmarking by running parallel fio jobs against a single hctx
> nullb in different hardware queue depth scenarios, and verifying both
> IOPS and queueing.
>
> Each experiment was repeated 5 times on a 20-CPU box, with 20 parallel
> jobs. fio was issuing fixed-size randwrites with qd=64 against nullb,
> varying only the hardware queue length per test.
>
> queue size 2 4 8 16 32 64
> 6.1-rc2 1681.1K (1.6K) 2633.0K (12.7K) 6940.8K (16.3K) 8172.3K (617.5K) 8391.7K (367.1K) 8606.1K (351.2K)
> patched 1721.8K (15.1K) 3016.7K (3.8K) 7543.0K (89.4K) 8132.5K (303.4K) 8324.2K (230.6K) 8401.8K (284.7K)
>
So if I understand correctly
QD 2,4,8 shows clear performance benefit from this patch whereas
QD 16, 32, 64 shows drop in performance it that correct ?
If my observation is correct then applications with high QD will
observe drop in the performance ?
Also, please share a table with block size/IOPS/BW/CPU (system/user)
/LAT/SLAT with % increase/decrease and document the raw numbers at the
end of the cover-letter for completeness along with fio job to others
can repeat the experiment...
> The following is a similar experiment, ran against a nullb with a single
> bitmap shared by 20 hctx spread across 2 NUMA nodes. This has 40
> parallel fio jobs operating on the same device
>
> queue size 2 4 8 16 32 64
> 6.1-rc2 1081.0K (2.3K) 957.2K (1.5K) 1699.1K (5.7K) 6178.2K (124.6K) 12227.9K (37.7K) 13286.6K (92.9K)
> patched 1081.8K (2.8K) 1316.5K (5.4K) 2364.4K (1.8K) 6151.4K (20.0K) 11893.6K (17.5K) 12385.6K (18.4K)
>
same here ...
> It has also survived blktests and a 12h-stress run against nullb. I also
> ran the code against nvme and a scsi SSD, and I didn't observe
> performance regression in those. If there are other tests you think I
> should run, please let me know and I will follow up with results.
>
-ck
Powered by blists - more mailing lists