[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1380571802.6501.71.camel@leira.trondhjem.org>
Date: Mon, 30 Sep 2013 20:10:09 +0000
From: "Myklebust, Trond" <Trond.Myklebust@...app.com>
To: Bernd Schubert <bernd.schubert@...m.fraunhofer.de>
CC: Miklos Szeredi <miklos@...redi.hu>,
Ric Wheeler <rwheeler@...hat.com>,
"J. Bruce Fields" <bfields@...ldses.org>,
Zach Brown <zab@...hat.com>,
"Anna Schumaker" <schumaker.anna@...il.com>,
Kernel Mailing List <linux-kernel@...r.kernel.org>,
Linux-Fsdevel <linux-fsdevel@...r.kernel.org>,
"linux-nfs@...r.kernel.org" <linux-nfs@...r.kernel.org>,
"Schumaker, Bryan" <Bryan.Schumaker@...app.com>,
"Martin K. Petersen" <mkp@....net>, Jens Axboe <axboe@...nel.dk>,
Mark Fasheh <mfasheh@...e.com>,
Joel Becker <jlbec@...lplan.org>,
Eric Wong <normalperson@...t.net>
Subject: Re: [RFC] extending splice for copy offloading
On Mon, 2013-09-30 at 22:00 +0200, Bernd Schubert wrote:
> On 09/30/2013 09:34 PM, Myklebust, Trond wrote:
> > On Mon, 2013-09-30 at 20:49 +0200, Bernd Schubert wrote:
> >> On 09/30/2013 08:02 PM, Myklebust, Trond wrote:
> >>> On Mon, 2013-09-30 at 19:48 +0200, Bernd Schubert wrote:
> >>>> On 09/30/2013 07:44 PM, Myklebust, Trond wrote:
> >>>>> On Mon, 2013-09-30 at 19:17 +0200, Bernd Schubert wrote:
> >>>>>> It would be nice if there would be way if the file system would get a
> >>>>>> hint that the target file is supposed to be copy of another file. That
> >>>>>> way distributed file systems could also create the target-file with the
> >>>>>> correct meta-information (same storage targets as in-file has).
> >>>>>> Well, if we cannot agree on that, file system with a custom protocol at
> >>>>>> least can detect from 0 to SSIZE_MAX and then reset metadata. I'm not
> >>>>>> sure if this would work for pNFS, though.
> >>>>>
> >>>>> splice() does not create new files. What you appear to be asking for
> >>>>> lies way outside the scope of that system call interface.
> >>>>>
> >>>>
> >>>> Sorry I know, definitely outside the scope of splice, but in the context
> >>>> of offloaded file copies. So the question is, what is the best way to
> >>>> address/discuss that?
> >>>
> >>> Why does it need to be addressed in the first place?
> >>
> >> An offloaded copy is still not efficient if different storage
> >> servers/targets used by from-file and to-file.
> >
> > So?
>
> mds1: orig-file
> oss1/target1: orig-chunk1
>
> mds1: target-file
> ossN/targetN: target-chunk1
>
> clientN: Performs the copy
>
> Ideally, orig-chunk1 and target-chunk1 are on the same server and same
> target. Copy offload then even could done from the underlying fs,
> similiar as local splice.
> If different ossN servers are used copies still have to be done over
> network by these storage servers, although the client only would need to
> initiate the copy. Still faster, but also not ideal.
>
> >
> >>>
> >>> What is preventing an application from retrieving and setting this
> >>> information using standard libc functions such as fstat()+open(), and
> >>> supplemented with libattr attr_setf/getf(), and libacl acl_get_fd/set_fd
> >>> where appropriate?
> >>>
> >>
> >> At a minimum this requires network and metadata overhead. And while I'm
> >> working on FhGFS now, I still wonder what other file system need to do -
> >> for example Lustre pre-allocates storage-target files on creating a
> >> file, so file layout changes mean even more overhead there.
> >
> > The problem you are describing is limited to a narrow set of storage
> > architectures. If copy offload using splice() doesn't make sense for
> > those architectures, then don't implement it for them.
>
> But it _does_ make sense. The file system just needs a hint that a
> splice copy is going to come up.
Just wait for the splice() system call. How is this any different from
write()?
> > You might be able to provide ioctls() to do these special hinted file
> > creations for those filesystems that need it, but the vast majority
> > don't, and you shouldn't enforce it on them.
>
> And exactly for that we need a standard - it does not make sense if each
> and every distributed file system implements its own
> ioctl/libattr/libacl interface for that.
>
> >
> >> Anyway, if we could agree on to use libattr or libacl to teach the file
> >> system about the upcoming splice call I would be fine.
> >
> > libattr and libacl are generic libraries that exist to manipulate xattrs
> > and acls. They do not need to contain Lustre-specific code.
> >
>
> pNFS, FhGFS, Lustre, Ceph, etc., all of them shall implement their own
> interface? And userspace needs to address all of them differently?
>
> I'm just asking for something like a vfs ioctl SPLICE_META_COPY (sorry,
> didn't find a better name yet), which would take in-file-path and
> out-file-path and allow the file system to create out-file-path with the
> same meta-layout as in-file-path. And it would need some flags, such as
> AUTO (file system decides if it makes sense to do a local copy) and
> FORCE (always try a local copy).
splice() is not a whole-file copy operation; it's a byte range copy. How
does the above help other than in the whole-file case?
--
Trond Myklebust
Linux NFS client maintainer
NetApp
Trond.Myklebust@...app.com
www.netapp.com
Powered by blists - more mailing lists