lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <79f17c7a.65f.19217621c47.Coremail.00107082@163.com>
Date: Sun, 22 Sep 2024 09:39:18 +0800 (CST)
From: "David Wang" <00107082@....com>
To: "Kent Overstreet" <kent.overstreet@...ux.dev>
Cc: linux-bcachefs@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [BUG?] bcachefs performance: read is way too slow when a file
 has no overwrite.

Hi, 

At 2024-09-22 00:12:01, "Kent Overstreet" <kent.overstreet@...ux.dev> wrote:
>On Sun, Sep 22, 2024 at 12:02:07AM GMT, David Wang wrote:
>> Hi, 
>> 
>> At 2024-09-09 21:37:35, "Kent Overstreet" <kent.overstreet@...ux.dev> wrote:
>> >On Sat, Sep 07, 2024 at 06:34:37PM GMT, David Wang wrote:
>> 
>> >
>> >Big standard deviation (high tail latency?) is something we'd want to
>> >track down. There's a bunch of time_stats in sysfs, but they're mostly
>> >for the write paths. If you're trying to identify where the latencies
>> >are coming from, we can look at adding some new time stats to isolate.
>> 
>> About performance, I have a theory based on some observation I made recently:
>> When user space app make a 4k(8 sectors) direct write, 
>> bcachefs would initiate a write request of ~11 sectors, including the checksum data, right?
>> This may not be a good offset+size pattern of block layer for performance.  
>> (I did get a very-very bad performance on ext4 if write with 5K size.)
>
>The checksum isn't inline with the data, it's stored with the pointer -
>so if you're seeing 11 sector writes, something really odd is going
>on...
>

.... This is really contradict with my observation:
1. fio stats yields a average 50K IOPS for a 400 seconds random direct write test.
2. from /proc/diskstatas, average "Field 5 -- # of writes completed"  per second is also 50K
(Here I conclude the performance issue is not caused by extra IOPS for checksum.)
3.  from "Field 10 -- # of milliseconds spent doing I/Os",  average disk "busy" time per second is about ~0.9second, similar to the result of ext4 test.
(Here I conclude the performance issue it not caused by not pushing disk device too hard.)
4. delta(Field 7 -- # of sectors written) / delta(Field 5 -- # of writes completed)  for 5 minutes interval is 11 sectors/write.
(This is why I draw the theory that the checksum is with raw data......I thought is was a reasonable...)

I will make some debug code to collect sector number patterns.

 


>I would suggest doing some testing with data checksums off first, to
>isolate the issue; then it sounds like that IO pattern needs to be
>looked at.

I will try it. 
 
>
>Check the extents btree in debugfs as well, to make sure the extents are
>getting written out as you think they are.



Thanks
David

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ