[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <56044C66.1090207@oracle.com>
Date: Thu, 24 Sep 2015 12:17:58 -0700
From: Ashish Samant <ashish.samant@...cle.com>
To: Miklos Szeredi <miklos@...redi.hu>, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org, fuse-devel@...ts.sourceforge.net,
Srinivas Eeda <srinivas.eeda@...cle.com>
Subject: Re: fuse scalability part 1
On 05/18/2015 08:13 AM, Miklos Szeredi wrote:
> This part splits out an "input queue" and a "processing queue" from the
> monolithic "fuse connection", each of those having their own spinlock.
>
> The end of the patchset adds the ability to "clone" a fuse connection. This
> means, that instead of having to read/write requests/answers on a single fuse
> device fd, the fuse daemon can have multiple distinct file descriptors open.
> Each of those can be used to receive requests and send answers, currently the
> only constraint is that a request must be answered on the same fd as it was read
> from.
>
> This can be extended further to allow binding a device clone to a specific CPU
> or NUMA node.
>
> Patchset is available here:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git for-next
>
> Libfuse patches adding support for "clone_fd" option:
>
> git://git.code.sf.net/p/fuse/fuse clone_fd
>
> Thanks,
> Miklos
>
>
Resending the numbers as attachments because my email client messes the
formatting of the message. Sorry for the noise.
We did some performance testing without these patches and with these
patches (with -o clone_fd option specified). We did 2 types of tests:
1. Throughput test : We did some parallel dd tests to read/write to FUSE
based database fs on a system with 8 numa nodes and 288 cpus. The
performance here is almost equal to the the per-numa patches we
submitted a while back.Please find results attached.
2. Spinlock access times test: We also ran some tests within the kernel
to check the time spent in accessing the spinlocks per request in both
cases. As can be seen, the time taken per request to access the spinlock
in the kernel code throughout the lifetime of the request is 30X to 100X
better in the 2nd case (with patchset). Please find results attached.
Thanks,
Ashish
View attachment "dd_test_results.txt" of type "text/plain" (1274 bytes)
View attachment "spinlock_access_time_test.txt" of type "text/plain" (581 bytes)
Powered by blists - more mailing lists