lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <871qqcg77l.fsf@suse.de>
Date:   Tue, 08 Nov 2022 22:03:26 -0500
From:   Gabriel Krisman Bertazi <krisman@...e.de>
To:     Chaitanya Kulkarni <chaitanyak@...dia.com>
Cc:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-block@...r.kernel.org" <linux-block@...r.kernel.org>,
        Hugh Dickins <hughd@...gle.com>,
        Keith Busch <kbusch@...nel.org>,
        "axboe@...nel.dk" <axboe@...nel.dk>,
        Liu Song <liusong@...ux.alibaba.com>, Jan Kara <jack@...e.cz>
Subject: Re: [PATCH] sbitmap: Use single per-bitmap counting to wake up
 queued tags

Chaitanya Kulkarni <chaitanyak@...dia.com> writes:

>> For more interesting cases, where there is queueing, we need to take
>> into account the cross-communication of the atomic operations.  I've
>> been benchmarking by running parallel fio jobs against a single hctx
>> nullb in different hardware queue depth scenarios, and verifying both
>> IOPS and queueing.
>> 
>> Each experiment was repeated 5 times on a 20-CPU box, with 20 parallel
>> jobs. fio was issuing fixed-size randwrites with qd=64 against nullb,
>> varying only the hardware queue length per test.
>> 
>> queue size 2                 4                 8                 16                 32                 64
>> 6.1-rc2    1681.1K (1.6K)    2633.0K (12.7K)   6940.8K (16.3K)   8172.3K (617.5K)   8391.7K (367.1K)   8606.1K (351.2K)
>> patched    1721.8K (15.1K)   3016.7K (3.8K)    7543.0K (89.4K)   8132.5K (303.4K)   8324.2K (230.6K)   8401.8K (284.7K)
>
>> 

Hi Chaitanya,

Thanks for the feedback.

> So if I understand correctly
> QD 2,4,8 shows clear performance benefit from this patch whereas
> QD 16, 32, 64 shows drop in performance it that correct ?
>
> If my observation is correct then applications with high QD will
> observe drop in the performance ?

To be honest, I'm not sure.  Given the overlap of the standard variation
(in parenthesis) with the mean, I'm not sure the observed drop is
statistically significant. In my prior analysis, I thought it wasn't.

I don't see where a significant difference would come from, to be honest,
because the higher the QD, the more likely it is  to go through the
not-contended path, where sbq->ws_active == 0.  This hot path is
identical to the existing implementation.

> Also, please share a table with block size/IOPS/BW/CPU (system/user)
> /LAT/SLAT with % increase/decrease and document the raw numbers at the
> end of the cover-letter for completeness along with fio job to others
> can repeat the experiment...

This was issued against the nullb and the IO size is fixed, matching the
device's block size (512b), which is why I am not tracking BW, only
IOPS.  I'm not sure the BW is still relevant in this scenario.

I'll definitely follow up with CPU time and latencies, and share the
fio job.  I'll also take another look on the significance of the
measured values for high QD.

Thank you,

-- 
Gabriel Krisman Bertazi

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ