linux-kernel - Re: [BUG?] bcachefs performance: read is way too slow when a file has no overwrite.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20240907103437.71139-1-00107082@163.com>
Date: Sat,  7 Sep 2024 18:34:37 +0800
From: David Wang <00107082@....com>
To: kent.overstreet@...ux.dev
Cc: 00107082@....com,
	linux-bcachefs@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [BUG?] bcachefs performance: read is way too slow when a file has no overwrite.

At 2024-09-07 01:38:11, "Kent Overstreet" <kent.overstreet@...ux.dev> wrote:
>On Fri, Sep 06, 2024 at 11:43:54PM GMT, David Wang wrote:
>> 
>> Hi,
>> 
>> I notice a very strange performance issue:
>> When run `fio direct randread` test on a fresh new bcachefs, the performance is very bad:
>> 	fio --randrepeat=1 --ioengine=libaio --direct=1 --name=test  --bs=4k --iodepth=64 --size=1G --readwrite=randread  --runtime=600 --numjobs=8 --time_based=1
>> 	...
>> 	Run status group 0 (all jobs):
>> 	   READ: bw=87.0MiB/s (91.2MB/s), 239B/s-14.2MiB/s (239B/s-14.9MB/s), io=1485MiB (1557MB), run=15593-17073msec
>> 
>> But if the files already exist and have alreay been thoroughly overwritten, the read performance is about 850MB+/s,
>> almost 10-times better!
>> 
>> This means, if I copy some file from somewhere else, and make read access only afterwards, I would get really bad performance.
>> (I copy files from other filesystem, and run fio read test on those files, the performance is indeed bad.)
>> Copy some prepared files, and make readonly usage afterwards, this usage scenario is quite normal for lots of apps, I think.
>
>That's because checksums are at extent granularity, not block: if you're
>doing O_DIRECT reads that are smaller than the writes the data was
>written with, performance will be bad because we have to read the entire
>extent to verify the checksum.


>
>block granular checksums will come at some point, as an optional feature
>(most of the time you don't want them, and you'd prefer more compact
>metadata)

Hi, I made further tests combining different write and read size, the results
are not confirming the explanation for O_DIRECT.

Without O_DIRECT (fio  --direct=0....), the average read bandwidth
is improved, but with a very big standard deviation:
+--------------------+----------+----------+----------+----------+
| prepare-write\read |    1k    |    4k    |    8K    |   16K    |
+--------------------+----------+----------+----------+----------+
|         1K         | 328MiB/s | 395MiB/s | 465MiB/s |          |
|         4K         | 193MiB/s | 219MiB/s | 274MiB/s | 392MiB/s |
|         8K         | 251MiB/s | 280MiB/s | 368MiB/s | 435MiB/s |
|        16K         | 302MiB/s | 380MiB/s | 464MiB/s | 577MiB/s |
+--------------------+----------+----------+----------+----------+
(Rows are write size when preparing the test files, and columns are read size for fio test.)

And with O_DIRECT, the result is:
+--------------------+-----------+-----------+----------+----------+
| prepare-write\read |     1k    |     4k    |    8K    |   16K    |
+--------------------+-----------+-----------+----------+----------+
|         1K         | 24.1MiB/s | 96.5MiB/s | 193MiB/s |          |
|         4K         | 14.4MiB/s | 57.6MiB/s | 116MiB/s | 230MiB/s |
|         8K         | 24.6MiB/s | 97.6MiB/s | 192MiB/s | 309MiB/s |
|        16K         | 26.4MiB/s |  104MiB/s | 206MiB/s | 402MiB/s |
+--------------------+-----------+-----------+----------+----------+

code to prepare the test files:
	#define KN 8 //<- adjust this for each row
	char name[32];
	char buf[1024*KN];
	int main() {
		int i, m = 1024*1024/KN, k, df;
		for (i=0; i<8; i++) {
			sprintf(name, "test.%d.0", i);
			fd = open(name, O_CREAT|O_DIRECT|O_SYNC|O_TRUNC|O_WRONLY);
			for (k=0; k<m; k++) write(fd, buf, sizeof(buf));
			close(fd);
		}
		return 0;
	}

Based on the result:
1. The row with prepare-write size 4K stands out, here.
When files were prepaired with write size 4K, the afterwards
 read performance is worse.  (I did double check the result,
but it is possible that I miss some affecting factors.);
2. Without O_DIRECT, read performance seems correlated with the difference
 between read size and prepare write size, but with O_DIRECT, correlation is not obvious.

And, to mention it again, if I overwrite the files **thoroughly** with fio write test
(using same size), the read performance afterwards would be very good:

	# overwrite the files with randwrite, block size 8k
	$ fio --randrepeat=1 --ioengine=libaio --direct=1 --name=test  --bs=8k --iodepth=64 --size=1G --readwrite=randwrite  --runtime=300 --numjobs=8 --time_based=1
	# test the read performance with randread, block size 8k
	$ fio --randrepeat=1 --ioengine=libaio --direct=1 --name=test  --bs=8k --iodepth=64 --size=1G --readwrite=randread  --runtime=300 --numjobs=8 --time_based=1
	...
	Run status group 0 (all jobs):
	   READ: bw=964MiB/s (1011MB/s), 116MiB/s-123MiB/s (121MB/s-129MB/s), io=283GiB (303GB), run=300004-300005msec



FYI
David