lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZTWn3QtTggmMHWxS@dread.disaster.area>
Date:   Mon, 23 Oct 2023 09:53:17 +1100
From:   Dave Chinner <david@...morbit.com>
To:     David Wang <00107082@....com>
Cc:     linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PERFORMANCE]fs: sendfile suffer performance degradation when
 buffer size have performance impact on underling IO

On Sat, Oct 21, 2023 at 08:19:34AM +0800, David Wang wrote:
> Hi, 
> 
> I was trying to confirm the performance improvement via replacing read/write sequences with sendfile, 
> But I got quite a surprising result:
> 
> $ gcc -DUSE_SENDFILE cp.cpp
> $ time ./a.out 
> 
> real	0m56.121s
> user	0m0.000s
> sys	0m4.844s
> 
> $ gcc  cp.cpp
> $ time ./a.out 
> 
> real	0m27.363s
> user	0m0.014s
> sys	0m4.443s
> 
> The result show that, in my test scenario,  the read/write sequences only use half of the time by sendfile.
> My guess is that sendfile using a default pipe with buffer size 1<<16 (16 pages), which is not tuned for the underling IO, 
> hence a read/write sequences with buffer size 1<<17 is much faster than sendfile.

Nope, it's just that you are forcing sendfile to do synchronous IO
on each internal loop. i.e:

> But the problem with sendfile is that there is no parameter to tune the buffer size from userspace...Any chance to fix this?
> 
> The test code is as following:
> 
> #include <stdio.h>
> #include <unistd.h>
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <sys/sendfile.h>
> #include <fcntl.h>
> 
> char buf[1<<17];   // much better than 1<<16
> int main() {
> 	int i, fin, fout, n, m;
> 	for (i=0; i<128; i++) {
> 		// dd if=/dev/urandom of=./bigfile bs=131072 count=256
> 		fin  = open("./bigfile", O_RDONLY);
> 		fout = open("./target", O_WRONLY | O_CREAT | O_DSYNC, S_IWUSR);

O_DSYNC is the problem here.

This forces an IO to disk for every write IO submission from
sendfile to the filesystem. For synchronous IO (as in "waiting for
completion before sending the next IO), a larger IO size will
*always* move data faster to storage.

FWIW, you'll get the same behaviour if you use O_DIRECT for either
source or destination file with sendfile - synchronous 64kB IOs are
a massive performance limitation even without O_DSYNC.

IOWs, don't use sendfile like this. Use buffered IO and
sendfile(fd); fdatasync(fd); if you need data integrity guarantees
and you won't see any perf problems resulting from the size of the
internal sendfile buffer....

-Dave.
-- 
Dave Chinner
david@...morbit.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ