[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.2.01.0909150923180.4950@localhost.localdomain>
Date: Tue, 15 Sep 2009 09:30:54 -0700 (PDT)
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Joel Becker <Joel.Becker@...cle.com>
cc: Mark Fasheh <mfasheh@...e.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
ocfs2-devel@....oracle.com
Subject: Re: [GIT PULL] ocfs2 changes for 2.6.32
On Mon, 14 Sep 2009, Joel Becker wrote:
> >
> > If you're talking about falling back to manually just copying the data,
> > then nobody is interested in that. User space can do that better with a
> > simple read-write loop or with splice, or whatever. There's no reaason
> > what-so-ever to do that.
>
> I'm talking about any facility for copying that isn't just a
> userspace loop. Like your discussion of network filesystems.
HOW?
We need to have a per-filesystem interface to that.
Having a '->copyfile()' function would be great.
But don't you see how _idiotic_ it is to then also having a '->reflink()'
function that does _conceptually_ the exact same thing, except it does it
by incrementing a usage count instead?
Do you see why I'm so unhappy to add a ->reflink() function?
> Hence I brought this to the filesystem summit and then fsdevel
> rather than just implementing it in ocfs2. I know NFS folks were in the
> room in April, and they said the call definition was workable. Can't
> remember if CIFS folks were there, but I think so.
It's not workable if you define the 'reflink()' function to not use any
disk space on the filesystem. Because SMB _will_ do a copy (and I presume
the NFS thing will too). So it would not in general be what you call
reflink, it will not be a "snapshot".
So if you _define_ the semantics of "reflink" to be that it's atomic and
doesn't use any new diskspace (apart from the new inode/directory entry,
of course), then it will be almost totally useless to other filesystems.
In fact, it's entirely possible to have filesystems that can avoid copying
the _data_ blocks, but would need to copy the indirect blocks - maybe the
data blocks are ref-counted, but the metadata needs to be per-file (I can
see many reasons to do it that way, even if it's organized as a tree -
it's how we do page table COW, for example, and it makes some things much
simpler).
Would that be a 'reflink()' or not? I have no way of knowing, because you
have decided on reflink on a purely ocfs2-specific implementation basis.
But I do know that such a filesystem would be perfectly happy to have a
'copyfile' function.
This is why I want the VFS pointers to be about _semantics_, not about
some random implementation detail.
Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists