lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 15 Jan 2016 22:56:11 +0100
From:	Hans Beckerus <hans.beckerus@...il.com>
To:	Nikhilesh Reddy <reddyn@...eaurora.org>,
	Nikolaus Rath <nikolaus@...h.org>
Cc:	Jan Kara <jack@...e.cz>, Richard Weinberger <richard@....at>,
	Miklos Szeredi <miklos@...redi.hu>,
	fuse-devel <fuse-devel@...ts.sourceforge.net>,
	Greg KH <gregkh@...uxfoundation.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Andy Lutomirski <luto@...capital.net>, sven.utcke@....de,
	Al Viro <viro@...iv.linux.org.uk>,
	Linux API <linux-api@...r.kernel.org>,
	linux-fsdevel <linux-fsdevel@...r.kernel.org>,
	Theodore Ts'o <tytso@....edu>,
	Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [fuse-devel] [PATCH] fuse: Add support for fuse stacked I/O

On 2016-01-15 8:29, Nikhilesh Reddy wrote:
> On Fri 15 Jan 2016 09:51:50 AM PST, Nikolaus Rath wrote:
>> On Jan 15 2016, Antonio SJ Musumeci <trapexit@...wn.link> wrote:
>>> The idea is that you want to be able to reason about open, create, etc. but
>>> don't care about the data transfer.
>>>
>>> I have N filesystems I wish to unionize. When I create a new file I want to
>>> pick the drive with the most free space (or some other algo). creat is
>>> called, succeeds, and now the application issuing this starts writing. The
>>> FUSE fs doesn't care about the writes. It just wanted to pick the drive
>>> this file should have been created on. Anything I'd do with the FD after
>>> that I'm happy to short circuit. I don't need to be asked what to do when
>>> fstat'ing this FD or anything which in FUSE hands over the 'fh'. It's just
>>> a file descriptor for me and I'd simply be calling the same function.
>>>
>>> Ideally I think one would want to be able to select which functions to
>>> short circuit and maybe even have it so that a short circuited function
>>> could propagate back through FUSE on error. But the read and write short
>>> circuiting is probably the biggest win given the overhead.
>>
>> I think you should avoid using the term "stacked" completely (which
>> would also make Christoph happy). There have been several discussions in
>> the past about adding a "fd delegation" function to FUSE. Generally, the
>> idea is that the FUSE userspace code tells the FUSE kernel module to
>> internally "delegate" to writes and reads for a given file (or even a
>> range in that file) to a different file descriptor provided by
>> userspace.
>>
>> I think that function would be useful, and not just for union file
>> systems. There are many FUSE file systems that end up writing the data
>> into some other file on the disk without doing any transformations on
>> the data itself. Especially with the range feature, they would all
>> benefit from the ability to delegate reads and writes.
I agree with Nikolaus here. I do believe there might be use-cases that 
could benefit from this.
I have a typical example were a FUSE fs wish to handle reads but really 
does not care about the writes other than
it should transparently write to the underlying fs. Simply getting a 
move of a file from the underlying fs to the
FUSE mount point if located on e.g. the same physical partition would 
result in a more or less instant operation, right?
But this also requires that the operations are selectable. A user should 
be able to choose which operation to bypass.
I understand though that this will need adaptations to libfuse as well.
Another question here is if an inotify write-type watch on the FUSE 
mount point will be affected by this or not?

>> However, Miklos has said in the past that the performance gain from this
>> is very small. You can get almost as good a result by splicing from one
>> fd to the other in userspace. In that case this function could actually
>> be implemented completely in libfuse.
>>
>>
>> Do you have any benchmark results that compare a splice-in-userspace
>> approach with your patch?
>>
>>
>> Best,
>> -Nikolaus
>>
>
> Hi
>
> @Linus
> Thanks for taking the time to reply to my email. It means a lot.
>
> FUSE allows users to implement extensions to filesystems ..such as enforcing policy or permissions without having to modify the kernel or maintain the policy in the kernel.
>
> One such example is what was quoted by Antonio above ..
> Another example is a fuse based filesystem that tries to enforce additional permissions on a FAT based mount point.
>
> >From what i could google there are many FUSE based filesystems out there that do things during the open call but simply pass through the read/and write I/O calls to the local "lower" filesystem where they actually store the data.
>
> >From what i understand ...unionfs or overlayfs and similar filesystems are primarily used to support a merged or unified view of directories and do not offer mechanisms to add policy or other checks /extensions to the I/O operations without modifying the kernel..
>
> The main motivation is to make FUSE performance better in such usecases without loosing out on the ease of implementing and extending in the userspace.
>
>
>
> @Nikolaus
> Our local benchmarks on embedded devices (where power and cpu usage is critical) show that splice doesnt help as much .. when running multiple cpu's results in increased power usage
>
> The below results are on a specific device model.
>
> Where IOPS is number of 4K based read or writes that could be performed each second.
>
>                                   regular         spliced         Stacked I/O
> sequencial write (MiBPS)	56.55633333	100.34445       141.7096667
> sequencial read (MiBPS)	        49.644	        60.43434        122.367
>
> random write (IOPS)	        2554.333333	4053.4545       8572
> random read (IOPS)	        977.3333333	1223.34         1432.666667
>
> The above tests were performed using a file size of 1GB
>
> Using stacked I/O showed the best performance (almost the same as the native EXT4 filesystem that is storing the real file)
>
> Also we measured that there is a 5% saving of Power and the CPU timeslices used. ( Splice did not improve this at all compared to default fuse)
>
> Random I/O i.e seeking to random parts of a file and reading ( usecases such as elf and *.so loading from fuse based filesystems also improved
>
>
> Similarly when using MMAPED I/O ( in an extended patch to this one.. still in progress) showed a significant improvement about a 400% improvement over default fuse.
>
> Also we can called it FUSE_DELEGATED_IO if that helps :).
> I chose to call is stacked i/o since we are technically stacking the fuse read/writes on the ext4/fat or other filesystems.
>
> Please let me know if you have any questions.
>
> @everyone
> Thanks so much for your comments and the interest.
> Also many of you have shown support for the patch in private emails.
> I would be grateful if you could voice the same support on the public thread so that everyone knows that there is interest in this patch.
>
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ