linux-kernel - Re: [BUG?] bcachefs performance: read is way too slow when a file has no overwrite.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20240924110807.28788-1-00107082@163.com>
Date: Tue, 24 Sep 2024 19:08:07 +0800
From: David Wang <00107082@....com>
To: kent.overstreet@...ux.dev
Cc: 00107082@....com,
	linux-bcachefs@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [BUG?] bcachefs performance: read is way too slow when a file has no overwrite.

Hi, 

At 2024-09-07 18:34:37, "David Wang" <00107082@....com> wrote:
>At 2024-09-07 01:38:11, "Kent Overstreet" <kent.overstreet@...ux.dev> wrote:
>>That's because checksums are at extent granularity, not block: if you're
>>doing O_DIRECT reads that are smaller than the writes the data was
>>written with, performance will be bad because we have to read the entire
>>extent to verify the checksum.
>
>

>Based on the result:
>1. The row with prepare-write size 4K stands out, here.
>When files were prepaired with write size 4K, the afterwards
> read performance is worse.  (I did double check the result,
>but it is possible that I miss some affecting factors.);
>2. Without O_DIRECT, read performance seems correlated with the difference
> between read size and prepare write size, but with O_DIRECT, correlation is not obvious.
>
>And, to mention it again, if I overwrite the files **thoroughly** with fio write test
>(using same size), the read performance afterwards would be very good:
>

Update some IO pattern (bio start address and size, in sectors, address&=-address),
between bcachefs and block layer:

4K-Direct-Read a file created by loop of `write(fd, buf, 1024*4)`:
+--------------------------+--------+--------+--------+--------+---------+
|       offset\size        |   1    |   6    |   7    |   8    |   128   |
+--------------------------+--------+--------+--------+--------+---------+
|                        1 | 0.015% | 0.003% |   -    |   -    |    -    |
|                       10 | 0.008% | 0.001% |   -    | 0.000% |    -    |
|                      100 | 0.003% | 0.001% | 0.000% |   -    |    -    |
|                     1000 | 0.002% | 0.000% |   -    |   -    |    -    |
|                    10000 | 0.001% | 0.000% |   -    |   -    |    -    |
|                   100000 | 0.000% |   -    |   -    |   -    |    -    |
|                  1000000 | 0.000% |   -    |   -    |   -    |    -    |
|                 10000000 | 0.000% |   -    |   -    |   -    | 49.989% |
|                100000000 | 0.001% |   -    |   -    |   -    | 24.994% |
|               1000000000 |   -    |   -    |   -    |   -    | 12.486% |
|              10000000000 |   -    |   -    |   -    |   -    |  6.253% |
|             100000000000 |   -    |   -    |   -    |   -    |  3.120% |
|            1000000000000 |   -    | 0.000% |   -    |   -    |  1.561% |
|           10000000000000 |   -    |   -    |   -    |   -    |  0.781% |
|          100000000000000 |   -    |   -    |   -    |   -    |  0.391% |
|         1000000000000000 |   -    |   -    |   -    |   -    |  0.195% |
|        10000000000000000 |   -    |   -    |   -    |   -    |  0.098% |
|       100000000000000000 |   -    |   -    |   -    |   -    |  0.049% |
|      1000000000000000000 |   -    |   -    |   -    |   -    |  0.024% |
|     10000000000000000000 |   -    |   -    |   -    |   -    |  0.013% |
|    100000000000000000000 |   -    |   -    |   -    |   -    |  0.006% |
|  10000000000000000000000 |   -    |   -    |   -    |   -    |  0.006% |
+--------------------------+--------+--------+--------+--------+---------+

4K-Direct-Read a file created by `dd if=/dev/urandom ...`
+--------------------------+---------+
|       offset\size        |   128   |
+--------------------------+---------+
|                 10000000 | 50.003% |
|                100000000 | 24.993% |
|               1000000000 | 12.508% |
|              10000000000 |  6.252% |
|             100000000000 |  3.118% |
|            1000000000000 |  1.561% |
|           10000000000000 |  0.782% |
|          100000000000000 |  0.391% |
|         1000000000000000 |  0.196% |
|        10000000000000000 |  0.098% |
|       100000000000000000 |  0.049% |
|      1000000000000000000 |  0.025% |
|     10000000000000000000 |  0.012% |
|    100000000000000000000 |  0.006% |
|   1000000000000000000000 |  0.006% |
+--------------------------+---------+

4K-Direct-Read a file which is *overwritten* by random fio 4k-direct-write for 10 minutes
+--------------------------+---------+--------+--------+
|       offset\size        |    8    |   16   |   24   |
+--------------------------+---------+--------+--------+
|                     1000 | 49.912% | 0.028% | 0.004% |
|                    10000 | 25.024% | 0.018% | 0.001% |
|                   100000 | 12.507% | 0.012% | 0.001% |
|                  1000000 |  6.273% | 0.002% | 0.001% |
|                 10000000 |  3.121% | 0.002% |   -    |
|                100000000 |  1.548% |   -    |   -    |
|               1000000000 |  0.778% | 0.001% |   -    |
|              10000000000 |  0.386% |   -    |   -    |
|             100000000000 |  0.194% |   -    |   -    |
|            1000000000000 |  0.098% |   -    |   -    |
|           10000000000000 |  0.046% |   -    |   -    |
|          100000000000000 |  0.023% |   -    |   -    |
|         1000000000000000 |  0.011% |   -    |   -    |
|        10000000000000000 |  0.006% |   -    |   -    |
|       100000000000000000 |  0.003% |   -    |   -    |
|      1000000000000000000 |  0.002% |   -    |   -    |
|     10000000000000000000 |  0.001% |   -    |   -    |
|  10000000000000000000000 |  0.000% |   -    |   -    |
+--------------------------+---------+--------+--------+


Those read of 1 sector size in the first IO pattern may need attention? (@Kent)
(The file was created via following code:
	#define _GNU_SOURCE
	#include <stdio.h>
	#include <fcntl.h>
	#include <unistd.h>

	#define KN 4
	char name[32];
	char buf[1024*KN];
	int main() {
		int i, m = 1024*1024/KN, k, fd;
		for (i=0; i<1; i++) {
			sprintf(name, "test.%d.0", i);
			fd = open(name, O_CREAT|O_DIRECT|O_SYNC|O_TRUNC|O_WRONLY);
			for (k=0; k<m; k++) write(fd, buf, sizeof(buf));
			close(fd);
		}
		return 0;
	}

I also collected latency between FS and BIO (submit_bio --> bio_endio),
 and did not observe difference between bcachefs and ext4, when extension size is mostly 4K.
On my SSD, one 4K-direct-read test even shows bcachefs usage is better:
 average 171086ns for ext4, 133304ns for bcachefs.

But the overall performance, from fio's point of view,
bcachefs is only half of ext4's, and cpu usage is much lower
than ext4: 60%- vs 90%+. 
(The bottleneck should be within bcachefs, I guess? But don't have
any idea of how to measure it.)

Glad to hear those new patches for 6.12,
https://lore.kernel.org/lkml/CAHk-=wh+atcBWa34mDdG1bFGRc28eJas3tP+9QrYXX6C7BX0JQ@mail.gmail.com/T/#m27c78e1f04c556ab064bec06520b8d7fcf4518c5
really looks promising, looking forward to test it next week~!!


Thanks
David