lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1380571802.6501.71.camel@leira.trondhjem.org>
Date:	Mon, 30 Sep 2013 20:10:09 +0000
From:	"Myklebust, Trond" <Trond.Myklebust@...app.com>
To:	Bernd Schubert <bernd.schubert@...m.fraunhofer.de>
CC:	Miklos Szeredi <miklos@...redi.hu>,
	Ric Wheeler <rwheeler@...hat.com>,
	"J. Bruce Fields" <bfields@...ldses.org>,
	Zach Brown <zab@...hat.com>,
	"Anna Schumaker" <schumaker.anna@...il.com>,
	Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Linux-Fsdevel <linux-fsdevel@...r.kernel.org>,
	"linux-nfs@...r.kernel.org" <linux-nfs@...r.kernel.org>,
	"Schumaker, Bryan" <Bryan.Schumaker@...app.com>,
	"Martin K. Petersen" <mkp@....net>, Jens Axboe <axboe@...nel.dk>,
	Mark Fasheh <mfasheh@...e.com>,
	Joel Becker <jlbec@...lplan.org>,
	Eric Wong <normalperson@...t.net>
Subject: Re: [RFC] extending splice for copy offloading

On Mon, 2013-09-30 at 22:00 +0200, Bernd Schubert wrote:
> On 09/30/2013 09:34 PM, Myklebust, Trond wrote:
> > On Mon, 2013-09-30 at 20:49 +0200, Bernd Schubert wrote:
> >> On 09/30/2013 08:02 PM, Myklebust, Trond wrote:
> >>> On Mon, 2013-09-30 at 19:48 +0200, Bernd Schubert wrote:
> >>>> On 09/30/2013 07:44 PM, Myklebust, Trond wrote:
> >>>>> On Mon, 2013-09-30 at 19:17 +0200, Bernd Schubert wrote:
> >>>>>> It would be nice if there would be way if the file system would get a
> >>>>>> hint that the target file is supposed to be copy of another file. That
> >>>>>> way distributed file systems could also create the target-file with the
> >>>>>> correct meta-information (same storage targets as in-file has).
> >>>>>> Well, if we cannot agree on that, file system with a custom protocol at
> >>>>>> least can detect from 0 to SSIZE_MAX and then reset metadata. I'm not
> >>>>>> sure if this would work for pNFS, though.
> >>>>>
> >>>>> splice() does not create new files. What you appear to be asking for
> >>>>> lies way outside the scope of that system call interface.
> >>>>>
> >>>>
> >>>> Sorry I know, definitely outside the scope of splice, but in the context
> >>>> of offloaded file copies. So the question is, what is the best way to
> >>>> address/discuss that?
> >>>
> >>> Why does it need to be addressed in the first place?
> >>
> >> An offloaded copy is still not efficient if different storage
> >> servers/targets used by from-file and to-file.
> >
> > So?
> 
> mds1: orig-file
> oss1/target1: orig-chunk1
> 
> mds1: target-file
> ossN/targetN: target-chunk1
> 
> clientN: Performs the copy
> 
> Ideally, orig-chunk1 and target-chunk1 are on the same server and same 
> target. Copy offload then even could done from the underlying fs, 
> similiar as local splice.
> If different ossN servers are used copies still have to be done over 
> network by these storage servers, although the client only would need to 
> initiate the copy. Still faster, but also not ideal.
> 
> >
> >>>
> >>> What is preventing an application from retrieving and setting this
> >>> information using standard libc functions such as fstat()+open(), and
> >>> supplemented with libattr attr_setf/getf(), and libacl acl_get_fd/set_fd
> >>> where appropriate?
> >>>
> >>
> >> At a minimum this requires network and metadata overhead. And while I'm
> >> working on FhGFS now, I still wonder what other file system need to do -
> >> for example Lustre pre-allocates storage-target files on creating a
> >> file, so file layout changes mean even more overhead there.
> >
> > The problem you are describing is limited to a narrow set of storage
> > architectures. If copy offload using splice() doesn't make sense for
> > those architectures, then don't implement it for them.
> 
> But it _does_ make sense. The file system just needs a hint that a 
> splice copy is going to come up.

Just wait for the splice() system call. How is this any different from
write()?

> > You might be able to provide ioctls() to do these special hinted file
> > creations for those filesystems that need it, but the vast majority
> > don't, and you shouldn't enforce it on them.
> 
> And exactly for that we need a standard - it does not make sense if each 
> and every distributed file system implements its own 
> ioctl/libattr/libacl interface for that.
> 
> >
> >> Anyway, if we could agree on to use libattr or libacl to teach the file
> >> system about the upcoming splice call I would be fine.
> >
> > libattr and libacl are generic libraries that exist to manipulate xattrs
> > and acls. They do not need to contain Lustre-specific code.
> >
> 
> pNFS, FhGFS, Lustre, Ceph, etc., all of them shall implement their own 
> interface? And userspace needs to address all of them differently?
>
> I'm just asking for something like a vfs ioctl SPLICE_META_COPY (sorry, 
> didn't find a better name yet), which would take in-file-path and 
> out-file-path and allow the file system to create out-file-path with the 
> same meta-layout as in-file-path. And it would need some flags, such as 
> AUTO (file system decides if it makes sense to do a local copy) and 
> FORCE (always try a local copy).

splice() is not a whole-file copy operation; it's a byte range copy. How
does the above help other than in the whole-file case?

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@...app.com
www.netapp.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ