[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20240924110807.28788-1-00107082@163.com>
Date: Tue, 24 Sep 2024 19:08:07 +0800
From: David Wang <00107082@....com>
To: kent.overstreet@...ux.dev
Cc: 00107082@....com,
linux-bcachefs@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [BUG?] bcachefs performance: read is way too slow when a file has no overwrite.
Hi,
At 2024-09-07 18:34:37, "David Wang" <00107082@....com> wrote:
>At 2024-09-07 01:38:11, "Kent Overstreet" <kent.overstreet@...ux.dev> wrote:
>>That's because checksums are at extent granularity, not block: if you're
>>doing O_DIRECT reads that are smaller than the writes the data was
>>written with, performance will be bad because we have to read the entire
>>extent to verify the checksum.
>
>
>Based on the result:
>1. The row with prepare-write size 4K stands out, here.
>When files were prepaired with write size 4K, the afterwards
> read performance is worse. (I did double check the result,
>but it is possible that I miss some affecting factors.);
>2. Without O_DIRECT, read performance seems correlated with the difference
> between read size and prepare write size, but with O_DIRECT, correlation is not obvious.
>
>And, to mention it again, if I overwrite the files **thoroughly** with fio write test
>(using same size), the read performance afterwards would be very good:
>
Update some IO pattern (bio start address and size, in sectors, address&=-address),
between bcachefs and block layer:
4K-Direct-Read a file created by loop of `write(fd, buf, 1024*4)`:
+--------------------------+--------+--------+--------+--------+---------+
| offset\size | 1 | 6 | 7 | 8 | 128 |
+--------------------------+--------+--------+--------+--------+---------+
| 1 | 0.015% | 0.003% | - | - | - |
| 10 | 0.008% | 0.001% | - | 0.000% | - |
| 100 | 0.003% | 0.001% | 0.000% | - | - |
| 1000 | 0.002% | 0.000% | - | - | - |
| 10000 | 0.001% | 0.000% | - | - | - |
| 100000 | 0.000% | - | - | - | - |
| 1000000 | 0.000% | - | - | - | - |
| 10000000 | 0.000% | - | - | - | 49.989% |
| 100000000 | 0.001% | - | - | - | 24.994% |
| 1000000000 | - | - | - | - | 12.486% |
| 10000000000 | - | - | - | - | 6.253% |
| 100000000000 | - | - | - | - | 3.120% |
| 1000000000000 | - | 0.000% | - | - | 1.561% |
| 10000000000000 | - | - | - | - | 0.781% |
| 100000000000000 | - | - | - | - | 0.391% |
| 1000000000000000 | - | - | - | - | 0.195% |
| 10000000000000000 | - | - | - | - | 0.098% |
| 100000000000000000 | - | - | - | - | 0.049% |
| 1000000000000000000 | - | - | - | - | 0.024% |
| 10000000000000000000 | - | - | - | - | 0.013% |
| 100000000000000000000 | - | - | - | - | 0.006% |
| 10000000000000000000000 | - | - | - | - | 0.006% |
+--------------------------+--------+--------+--------+--------+---------+
4K-Direct-Read a file created by `dd if=/dev/urandom ...`
+--------------------------+---------+
| offset\size | 128 |
+--------------------------+---------+
| 10000000 | 50.003% |
| 100000000 | 24.993% |
| 1000000000 | 12.508% |
| 10000000000 | 6.252% |
| 100000000000 | 3.118% |
| 1000000000000 | 1.561% |
| 10000000000000 | 0.782% |
| 100000000000000 | 0.391% |
| 1000000000000000 | 0.196% |
| 10000000000000000 | 0.098% |
| 100000000000000000 | 0.049% |
| 1000000000000000000 | 0.025% |
| 10000000000000000000 | 0.012% |
| 100000000000000000000 | 0.006% |
| 1000000000000000000000 | 0.006% |
+--------------------------+---------+
4K-Direct-Read a file which is *overwritten* by random fio 4k-direct-write for 10 minutes
+--------------------------+---------+--------+--------+
| offset\size | 8 | 16 | 24 |
+--------------------------+---------+--------+--------+
| 1000 | 49.912% | 0.028% | 0.004% |
| 10000 | 25.024% | 0.018% | 0.001% |
| 100000 | 12.507% | 0.012% | 0.001% |
| 1000000 | 6.273% | 0.002% | 0.001% |
| 10000000 | 3.121% | 0.002% | - |
| 100000000 | 1.548% | - | - |
| 1000000000 | 0.778% | 0.001% | - |
| 10000000000 | 0.386% | - | - |
| 100000000000 | 0.194% | - | - |
| 1000000000000 | 0.098% | - | - |
| 10000000000000 | 0.046% | - | - |
| 100000000000000 | 0.023% | - | - |
| 1000000000000000 | 0.011% | - | - |
| 10000000000000000 | 0.006% | - | - |
| 100000000000000000 | 0.003% | - | - |
| 1000000000000000000 | 0.002% | - | - |
| 10000000000000000000 | 0.001% | - | - |
| 10000000000000000000000 | 0.000% | - | - |
+--------------------------+---------+--------+--------+
Those read of 1 sector size in the first IO pattern may need attention? (@Kent)
(The file was created via following code:
#define _GNU_SOURCE
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#define KN 4
char name[32];
char buf[1024*KN];
int main() {
int i, m = 1024*1024/KN, k, fd;
for (i=0; i<1; i++) {
sprintf(name, "test.%d.0", i);
fd = open(name, O_CREAT|O_DIRECT|O_SYNC|O_TRUNC|O_WRONLY);
for (k=0; k<m; k++) write(fd, buf, sizeof(buf));
close(fd);
}
return 0;
}
I also collected latency between FS and BIO (submit_bio --> bio_endio),
and did not observe difference between bcachefs and ext4, when extension size is mostly 4K.
On my SSD, one 4K-direct-read test even shows bcachefs usage is better:
average 171086ns for ext4, 133304ns for bcachefs.
But the overall performance, from fio's point of view,
bcachefs is only half of ext4's, and cpu usage is much lower
than ext4: 60%- vs 90%+.
(The bottleneck should be within bcachefs, I guess? But don't have
any idea of how to measure it.)
Glad to hear those new patches for 6.12,
https://lore.kernel.org/lkml/CAHk-=wh+atcBWa34mDdG1bFGRc28eJas3tP+9QrYXX6C7BX0JQ@mail.gmail.com/T/#m27c78e1f04c556ab064bec06520b8d7fcf4518c5
really looks promising, looking forward to test it next week~!!
Thanks
David
Powered by blists - more mailing lists