lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALmpHyb5wpn-dABuEwpypUggRY=1aULvJW0z17HpH50pr1=HKg@mail.gmail.com>
Date:	Tue, 27 Nov 2012 09:04:18 +0800
From:	Feng Shuo <steve.shuo.feng@...il.com>
To:	Maxim Patlasov <mpatlasov@...allels.com>
Cc:	miklos@...redi.hu, dev@...allels.com,
	"fuse-devel@...ts.sourceforge.net" <fuse-devel@...ts.sourceforge.net>,
	linux-kernel@...r.kernel.org, jbottomley@...allels.com,
	viro@...iv.linux.org.uk, linux-fsdevel@...r.kernel.org,
	xemul@...nvz.org
Subject: Re: [PATCH v2 00/14] fuse: An attempt to implement a write-back cache policy

Hi Maxim,

I'm new to fuse but have some experience with NFS. From my
understanding after reviewing your patchset, it seems only work with
local file system or a distributed file system whose file is never
modified (could be grown but no or very few modified) because it
doesn't exam the pre/post status of the writing object (e.g. a file).
So if a file is modified outside, fuse might not get any chance to
handle it...... Correct me if I got wrong since I'm really new to
fuse. :-)

On Sat, Nov 17, 2012 at 1:04 AM, Maxim Patlasov <mpatlasov@...allels.com> wrote:
> Hi,
>
> This is the second iteration of Pavel Emelyanov's patch-set implementing
> write-back policy for FUSE page cache. Initial patch-set description was
> the following:
>
> One of the problems with the existing FUSE implementation is that it uses the
> write-through cache policy which results in performance problems on certain
> workloads. E.g. when copying a big file into a FUSE file the cp pushes every
> 128k to the userspace synchronously. This becomes a problem when the userspace
> back-end uses networking for storing the data.
>
> A good solution of this is switching the FUSE page cache into a write-back policy.
> With this file data are pushed to the userspace with big chunks (depending on the
> dirty memory limits, but this is much more than 128k) which lets the FUSE daemons
> handle the size updates in a more efficient manner.
>
> The writeback feature is per-connection and is explicitly configurable at the
> init stage (is it worth making it CAP_SOMETHING protected?) When the writeback is
> turned ON:
>
> * still copy writeback pages to temporary buffer when sending a writeback request
>   and finish the page writeback immediately
>
> * make kernel maintain the inode's i_size to avoid frequent i_size synchronization
>   with the user space
>
> * take NR_WRITEBACK_TEMP into account when makeing balance_dirty_pages decision.
>   This protects us from having too many dirty pages on FUSE
>
> The provided patchset survives the fsx test. Performance measurements are not yet
> all finished, but the mentioned copying of a huge file becomes noticeably faster
> even on machines with few RAM and doesn't make the system stuck (the dirty pages
> balancer does its work OK). Applies on top of v3.5-rc4.
>
> We are currently exploring this with our own distributed storage implementation
> which is heavily oriented on storing big blobs of data with extremely rare meta-data
> updates (virtual machines' and containers' disk images). With the existing cache
> policy a typical usage scenario -- copying a big VM disk into a cloud -- takes way
> too much time to proceed, much longer than if it was simply scp-ed over the same
> network. The write-back policy (as I mentioned) noticeably improves this scenario.
> Kirill (in Cc) can share more details about the performance and the storage concepts
> details if required.
>
> Changed in v2:
>  - numerous bugfixes:
>    - fuse_write_begin and fuse_writepages_fill and fuse_writepage_locked must wait
>      on page writeback because page writeback can extend beyond the lifetime of
>      the page-cache page
>    - fuse_send_writepages can end_page_writeback on original page only after adding
>      request to fi->writepages list; otherwise another writeback may happen inside
>      the gap between end_page_writeback and adding to the list
>    - fuse_direct_io must wait on page writeback; otherwise data corruption is possible
>      due to reordering requests
>    - fuse_flush must flush dirty memory and wait for all writeback on given inode
>      before sending FUSE_FLUSH to userspace; otherwise FUSE_FLUSH is not reliable
>    - fuse_file_fallocate must hold i_mutex around FUSE_FALLOCATE and i_size update;
>      otherwise a race with a writer extending i_size is possible
>    - fix handling errors in fuse_writepages and fuse_send_writepages
>  - handle i_mtime intelligently if writeback cache is on (see patch #7 (update i_mtime
>    on buffered writes) for details.
>  - put enabling writeback cache under fusermount control; (see mount option
>    'allow_wbcache' introduced by patch #13 (turn writeback cache on))
>  - rebased on v3.7-rc5
>
> Thanks,
> Maxim
>
> ---
>
> Maxim Patlasov (14):
>       fuse: Linking file to inode helper
>       fuse: Getting file for writeback helper
>       fuse: Prepare to handle short reads
>       fuse: Prepare to handle multiple pages in writeback
>       fuse: Connection bit for enabling writeback
>       fuse: Trust kernel i_size only
>       fuse: Update i_mtime on buffered writes
>       fuse: Flush files on wb close
>       fuse: Implement writepages and write_begin/write_end callbacks
>       fuse: fuse_writepage_locked() should wait on writeback
>       fuse: fuse_flush() should wait on writeback
>       fuse: Fix O_DIRECT operations vs cached writeback misorder
>       fuse: Turn writeback cache on
>       mm: Account for WRITEBACK_TEMP in balance_dirty_pages
>
>
>  fs/fuse/dir.c             |   51 ++++
>  fs/fuse/file.c            |  523 +++++++++++++++++++++++++++++++++++++++++----
>  fs/fuse/fuse_i.h          |   20 ++
>  fs/fuse/inode.c           |   98 ++++++++
>  include/uapi/linux/fuse.h |    1
>  mm/page-writeback.c       |    3
>  6 files changed, 638 insertions(+), 58 deletions(-)
>
> --
> Signature
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Feng Shuo
Tel: (86)10-59851155-2116
Fax: (86)10-59851155-2008
Tianjin Zhongke Blue Whale Information Technologies Co., Ltd
10th Floor, Tower A, The GATE building, No. 19 Zhong-guan-cun Avenue
Haidian District, Beijing, China
Postcode 100080
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ