linux-ext4 - Re: [linus:master] [jbd2] 6a3afb6ac6: fileio.latency_95th

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <5fb892c2-a532-84bf-fbe2-148b32079fa4@huawei.com>
Date: Wed, 3 Jan 2024 21:28:35 +0800
From: Zhang Yi <yi.zhang@...wei.com>
To: Jan Kara <jack@...e.cz>
CC: kernel test robot <oliver.sang@...el.com>, <oe-lkp@...ts.linux.dev>,
	<lkp@...el.com>, <linux-kernel@...r.kernel.org>, Theodore Ts'o
	<tytso@....edu>, <linux-ext4@...r.kernel.org>, <ying.huang@...el.com>,
	<feng.tang@...el.com>, <fengwei.yin@...el.com>, <yukuai3@...wei.com>
Subject: Re: [linus:master] [jbd2] 6a3afb6ac6: fileio.latency_95th_ms 92.5%
 regression

On 2024/1/3 17:49, Jan Kara wrote:
> Hello!
> 
> On Wed 03-01-24 11:31:39, Zhang Yi wrote:
>> On 2024/1/2 15:31, kernel test robot wrote:
>>>
>>>
>>> Hello,
>>>
>>> kernel test robot noticed a 92.5% regression of fileio.latency_95th_ms on:
>>
>> This seems a little weird, the tests doesn't use blk-cgroup, and the patch
>> increase IO priority in WBT, so there shouldn't be any negative influence in
>> theory.
> 
> I don't have a great explanation either but there could be some impact e.g.
> due to a different request merging of IO done by JBD2 vs the flush worker or
> something like that. Note that the throughput reduction is only 5.7% so it
> is not huge.

Yeah, make sense, this should be one explanation that can be thought of at
the moment.

> 
>> I've tested sysbench on my machine with Intel Xeon Gold 6240 CPU,
>> 400GB memory with HDD disk, and couldn't reproduce this regression.
>>
>> ==
>> Without 6a3afb6ac6 ("jbd2: increase the journal IO's priority")
>> ==
>>
>>  $ sysbench fileio --events=0 --threads=128 --time=600 --file-test-mode=seqwr --file-total-size=68719476736 --file-io-mode=sync --file-num=1024 run
>>
>>   sysbench 1.1.0-df89d34 (using bundled LuaJIT 2.1.0-beta3)
>>
>>   Running the test with following options:
>>   Number of threads: 128
>>   Initializing random number generator from current time
>>
>>
>>   Extra file open flags: (none)
>>   1024 files, 64MiB each
>>   64GiB total file size
>>   Block size 16KiB
>>   Periodic FSYNC enabled, calling fsync() each 100 requests.
>>   Calling fsync() at the end of test, Enabled.
>>   Using synchronous I/O mode
>>   Doing sequential write (creation) test
>>   Initializing worker threads...
>>
>>   Threads started!
>>
>>
>>   Throughput:
>>            read:  IOPS=0.00 0.00 MiB/s (0.00 MB/s)
>>            write: IOPS=31961.19 499.39 MiB/s (523.65 MB/s)
>>            fsync: IOPS=327500.24
> 
> Well, your setup seems to be very different from what LKP was using. You
> are achieving ~500 MB/s (likely because all the files fit into the cache
> and more or less even within the dirty limit of the page cache) while LKP
> run achieves only ~54 MB/s (i.e., we are pretty much bound by the rather
> slow disk). I'd try running with something like 32GB of RAM to really see
> the disk speed impact...
> 

I'm afraid I missed the vmstat.io.bo changes, I will limit the dirty ratio
and test it again tomorrow.

Thanks,
Yi.