linux-kernel - Re: [PATCH] sbitmap: Use single per-bitmap counting to wake up queued tags

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <871qqcg77l.fsf@suse.de>
Date:   Tue, 08 Nov 2022 22:03:26 -0500
From:   Gabriel Krisman Bertazi <krisman@...e.de>
To:     Chaitanya Kulkarni <chaitanyak@...dia.com>
Cc:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-block@...r.kernel.org" <linux-block@...r.kernel.org>,
        Hugh Dickins <hughd@...gle.com>,
        Keith Busch <kbusch@...nel.org>,
        "axboe@...nel.dk" <axboe@...nel.dk>,
        Liu Song <liusong@...ux.alibaba.com>, Jan Kara <jack@...e.cz>
Subject: Re: [PATCH] sbitmap: Use single per-bitmap counting to wake up
 queued tags

Chaitanya Kulkarni <chaitanyak@...dia.com> writes:

>> For more interesting cases, where there is queueing, we need to take
>> into account the cross-communication of the atomic operations.  I've
>> been benchmarking by running parallel fio jobs against a single hctx
>> nullb in different hardware queue depth scenarios, and verifying both
>> IOPS and queueing.
>> 
>> Each experiment was repeated 5 times on a 20-CPU box, with 20 parallel
>> jobs. fio was issuing fixed-size randwrites with qd=64 against nullb,
>> varying only the hardware queue length per test.
>> 
>> queue size 2                 4                 8                 16                 32                 64
>> 6.1-rc2    1681.1K (1.6K)    2633.0K (12.7K)   6940.8K (16.3K)   8172.3K (617.5K)   8391.7K (367.1K)   8606.1K (351.2K)
>> patched    1721.8K (15.1K)   3016.7K (3.8K)    7543.0K (89.4K)   8132.5K (303.4K)   8324.2K (230.6K)   8401.8K (284.7K)
>
>> 

Hi Chaitanya,

Thanks for the feedback.

> So if I understand correctly
> QD 2,4,8 shows clear performance benefit from this patch whereas
> QD 16, 32, 64 shows drop in performance it that correct ?
>
> If my observation is correct then applications with high QD will
> observe drop in the performance ?

To be honest, I'm not sure.  Given the overlap of the standard variation
(in parenthesis) with the mean, I'm not sure the observed drop is
statistically significant. In my prior analysis, I thought it wasn't.

I don't see where a significant difference would come from, to be honest,
because the higher the QD, the more likely it is  to go through the
not-contended path, where sbq->ws_active == 0.  This hot path is
identical to the existing implementation.

> Also, please share a table with block size/IOPS/BW/CPU (system/user)
> /LAT/SLAT with % increase/decrease and document the raw numbers at the
> end of the cover-letter for completeness along with fio job to others
> can repeat the experiment...

This was issued against the nullb and the IO size is fixed, matching the
device's block size (512b), which is why I am not tracking BW, only
IOPS.  I'm not sure the BW is still relevant in this scenario.

I'll definitely follow up with CPU time and latencies, and share the
fio job.  I'll also take another look on the significance of the
measured values for high QD.

Thank you,

-- 
Gabriel Krisman Bertazi