[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOQ4uxhqaTUwD8Lw+9HWWj61EXRv4X-eE3u4DFeWnv_VOUZF5A@mail.gmail.com>
Date: Tue, 16 Feb 2021 21:40:43 +0200
From: Amir Goldstein <amir73il@...il.com>
To: Steve French <smfrench@...il.com>
Cc: Anna Schumaker <anna.schumaker@...app.com>,
Luis Henriques <lhenriques@...e.de>,
Trond Myklebust <trondmy@...merspace.com>,
"samba-technical@...ts.samba.org" <samba-technical@...ts.samba.org>,
"drinkcat@...omium.org" <drinkcat@...omium.org>,
"iant@...gle.com" <iant@...gle.com>,
"linux-cifs@...r.kernel.org" <linux-cifs@...r.kernel.org>,
"darrick.wong@...cle.com" <darrick.wong@...cle.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"jlayton@...nel.org" <jlayton@...nel.org>,
"llozano@...omium.org" <llozano@...omium.org>,
"linux-nfs@...r.kernel.org" <linux-nfs@...r.kernel.org>,
"miklos@...redi.hu" <miklos@...redi.hu>,
"viro@...iv.linux.org.uk" <viro@...iv.linux.org.uk>,
"dchinner@...hat.com" <dchinner@...hat.com>,
"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
"gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
"sfrench@...ba.org" <sfrench@...ba.org>,
"ceph-devel@...r.kernel.org" <ceph-devel@...r.kernel.org>
Subject: Re: [PATCH v2] vfs: prevent copy_file_range to copy across devices
On Tue, Feb 16, 2021 at 9:31 PM Steve French <smfrench@...il.com> wrote:
>
> On Tue, Feb 16, 2021 at 1:29 PM Anna Schumaker
> <anna.schumaker@...app.com> wrote:
> >
> > On Tue, Feb 16, 2021 at 2:22 PM Amir Goldstein <amir73il@...il.com> wrote:
> > >
> > > On Tue, Feb 16, 2021 at 8:54 PM Luis Henriques <lhenriques@...e.de> wrote:
> > > >
> > > > Amir Goldstein <amir73il@...il.com> writes:
> > > >
> > > > > On Tue, Feb 16, 2021 at 6:41 PM Luis Henriques <lhenriques@...e.de> wrote:
> > > > >>
> > > > >> Amir Goldstein <amir73il@...il.com> writes:
> > > > >>
> > > > >> >> Ugh. And I guess overlayfs may have a similar problem.
> > > > >> >
> > > > >> > Not exactly.
> > > > >> > Generally speaking, overlayfs should call vfs_copy_file_range()
> > > > >> > with the flags it got from layer above, so if called from nfsd it
> > > > >> > will allow cross fs copy and when called from syscall it won't.
> > > > >> >
> > > > >> > There are some corner cases where overlayfs could benefit from
> > > > >> > COPY_FILE_SPLICE (e.g. copy from lower file to upper file), but
> > > > >> > let's leave those for now. Just leave overlayfs code as is.
> > > > >>
> > > > >> Got it, thanks for clarifying.
> > > > >>
> > > > >> >> > This is easy to solve with a flag COPY_FILE_SPLICE (or something) that
> > > > >> >> > is internal to kernel users.
> > > > >> >> >
> > > > >> >> > FWIW, you may want to look at the loop in ovl_copy_up_data()
> > > > >> >> > for improvements to nfsd_copy_file_range().
> > > > >> >> >
> > > > >> >> > We can move the check out to copy_file_range syscall:
> > > > >> >> >
> > > > >> >> > if (flags != 0)
> > > > >> >> > return -EINVAL;
> > > > >> >> >
> > > > >> >> > Leave the fallback from all filesystems and check for the
> > > > >> >> > COPY_FILE_SPLICE flag inside generic_copy_file_range().
> > > > >> >>
> > > > >> >> Ok, the diff bellow is just to make sure I understood your suggestion.
> > > > >> >>
> > > > >> >> The patch will also need to:
> > > > >> >>
> > > > >> >> - change nfs and overlayfs calls to vfs_copy_file_range() so that they
> > > > >> >> use the new flag.
> > > > >> >>
> > > > >> >> - check flags in generic_copy_file_checks() to make sure only valid flags
> > > > >> >> are used (COPY_FILE_SPLICE at the moment).
> > > > >> >>
> > > > >> >> Also, where should this flag be defined? include/uapi/linux/fs.h?
> > > > >> >
> > > > >> > Grep for REMAP_FILE_
> > > > >> > Same header file, same Documentation rst file.
> > > > >> >
> > > > >> >>
> > > > >> >> Cheers,
> > > > >> >> --
> > > > >> >> Luis
> > > > >> >>
> > > > >> >> diff --git a/fs/read_write.c b/fs/read_write.c
> > > > >> >> index 75f764b43418..341d315d2a96 100644
> > > > >> >> --- a/fs/read_write.c
> > > > >> >> +++ b/fs/read_write.c
> > > > >> >> @@ -1383,6 +1383,13 @@ ssize_t generic_copy_file_range(struct file *file_in, loff_t pos_in,
> > > > >> >> struct file *file_out, loff_t pos_out,
> > > > >> >> size_t len, unsigned int flags)
> > > > >> >> {
> > > > >> >> + if (!(flags & COPY_FILE_SPLICE)) {
> > > > >> >> + if (!file_out->f_op->copy_file_range)
> > > > >> >> + return -EOPNOTSUPP;
> > > > >> >> + else if (file_out->f_op->copy_file_range !=
> > > > >> >> + file_in->f_op->copy_file_range)
> > > > >> >> + return -EXDEV;
> > > > >> >> + }
> > > > >> >
> > > > >> > That looks strange, because you are duplicating the logic in
> > > > >> > do_copy_file_range(). Maybe better:
> > > > >> >
> > > > >> > if (WARN_ON_ONCE(flags & ~COPY_FILE_SPLICE))
> > > > >> > return -EINVAL;
> > > > >> > if (flags & COPY_FILE_SPLICE)
> > > > >> > return do_splice_direct(file_in, &pos_in, file_out, &pos_out,
> > > > >> > len > MAX_RW_COUNT ? MAX_RW_COUNT : len, 0);
> > > > >>
> > > > >> My initial reasoning for duplicating the logic in do_copy_file_range() was
> > > > >> to allow the generic_copy_file_range() callers to be left unmodified and
> > > > >> allow the filesystems to default to this implementation.
> > > > >>
> > > > >> With this change, I guess that the calls to generic_copy_file_range() from
> > > > >> the different filesystems can be dropped, as in my initial patch, as they
> > > > >> will always get -EINVAL. The other option would be to set the
> > > > >> COPY_FILE_SPLICE flag in those calls, but that would get us back to the
> > > > >> problem we're trying to solve.
> > > > >
> > > > > I don't understand the problem.
> > > > >
> > > > > What exactly is wrong with the code I suggested?
> > > > > Why should any filesystem be changed?
> > > > >
> > > > > Maybe I am missing something.
> > > >
> > > > Ok, I have to do a full brain reboot and start all over.
> > > >
> > > > Before that, I picked the code you suggested and tested it. I've mounted
> > > > a cephfs filesystem and used xfs_io to execute a 'copy_range' command
> > > > using /sys/kernel/debug/sched_features as source. The result was a
> > > > 0-sized file in cephfs. And the reason is thevfs_copy_file_range()
> > > > early exit in:
> > > >
> > > > if (len == 0)
> > > > return 0;
> > > >
> > > > 'len' is set in generic_copy_file_checks().
> > >
> > > Good point.. I guess we will need to do all the checks earlier in
> > > generic_copy_file_checks() including the logic of:
> > >
> > > if (file_in->f_op->remap_file_range &&
> > > file_inode(file_in)->i_sb == file_inode(file_out)->i_sb)
> > >
> > >
> > > >
> > > > This means that we're not solving the original problem anymore (probably
> > > > since v1 of this patch, haven't checked).
> > > >
> > > > Also, re-reading Trond's emails, I read: "... also disallowing the copy
> > > > from, say, an XFS formatted partition to an ext4 partition". Isn't that
> > > > *exactly* what we're trying to do here? I.e. _prevent_ these copies from
> > > > happening so that tracefs files can't be CFR'ed?
> > > >
> > >
> > > We want to address the report which means calls coming from
> > > copy_file_range() syscall.
> > >
> > > Trond's use case is vfs_copy_file_range() coming from nfsd.
> > > When he writes about copy from XFS to ext4, he means an
> > > NFS client is issuing server side copy (on same or different NFS mounts)
> > > and the NFS server is executing nfsd_copy_file_range() on a source
> > > file that happens to be on XFS and destination happens to be on ext4.
> >
> > NFS also supports a server-to-server copy where the destination server
> > mounts the source server and reads the data to be copied. Please don't
> > break that either :)
>
As long as the copy is via nfsd_copy_file_range() and not from the syscall
it should not regress.
> This is a case we will eventually need to support for cifs (SMB3) as well.
>
samba already does server side copy very well without needing any support
from the kernel.
nfsd also doesn't *need* to use vfs_copy_file_range() it can use kernel APIs
like the loop in ovl_copy_up_data(). But it does, so we should not regress it.
samba/nfsd can try to use copy_file_range() and it will work if the
source/target
fs support it. Otherwise, the server can perfectly well do the copy via other
available interfaces, just like userspace copy tools.
Thanks,
Amir.
Powered by blists - more mailing lists