linux-kernel - Re: Trying to measure performance with splice/vmsplice ....

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-Id: <1271873856.3834.60.camel@iscandar.digidescorp.com>
Date:	Wed, 21 Apr 2010 13:17:36 -0500
From:	"Steven J. Magnani" <steve@...idescorp.com>
To:	Rick Sherm <rick.sherm@...oo.com>
Cc:	linux-kernel@...r.kernel.org, axboe@...nel.dk
Subject: Re: Trying to measure performance with splice/vmsplice ....

Hi Rick,

On Fri, 2010-04-16 at 10:02 -0700, Rick Sherm wrote:
> Q3) When using splice, even though the destination file is opened in O_DIRECT mode, the data gets cached. I verified it using vmstat.
> 
> r  b   swpd   free   buff    cache   
> 1  0      0 9358820 116576 2100904
> 
> ./splice_to_splice
> 
> r  b   swpd   free   buff  cache
> 2  0      0 7228908 116576 4198164
> 
> I see the same caching issue even if I vmsplice buffers(simple malloc'd iov) to a pipe and then splice the pipe to a file. The speed is still an issue with vmsplice too.
> 

One thing is that O_DIRECT is a hint; not all filesystems bypass the
cache. I'm pretty sure ext2 does, and I know fat doesn't. 

Another variable is whether (and how) your filesystem implements the
splice_write file operation. The generic one (pipe_to_file) in
fs/splice.c copies data to pagecache. The default one goes out to
vfs_write() and might stand more of a chance of honoring O_DIRECT.

> Q4) Also, using splice, you can only transfer 64K worth of data(PIPE_BUFFERS*PAGE_SIZE) at a time,correct?.But using stock read/write, I can go upto 1MB buffer. After that I don't see any gain. But still the reduction in system/cpu time is significant.

I'm not a splicing expert but I did spend some time recently trying to
improve FTP reception by splicing from a TCP socket to a file. I found
that while splicing avoids copying packets to userland, that gain is
more than offset by a large increase in calls into the storage stack.
It's especially bad with TCP sockets because a typical packet has, say,
1460 bytes of data. Since splicing works on PIPE_BUFFERS pages at a
time, and packet pages are only about 35% utilized, each cycle to
userland I could only move 23 KiB of data at most. Some similar effect
may be in play in your case.

ftrace may be of some help in finding the bottleneck...

Regards,
------------------------------------------------------------------------
 Steven J. Magnani               "I claim this network for MARS!
 www.digidescorp.com              Earthling, return my space modulator!"

 #include <standard.disclaimer>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/