linux-kernel - Re: [RFC] extending splice for copy offloading

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <52498F4A.2040809@redhat.com>
Date:	Mon, 30 Sep 2013 10:48:42 -0400
From:	Ric Wheeler <rwheeler@...hat.com>
To:	"J. Bruce Fields" <bfields@...ldses.org>
CC:	Miklos Szeredi <miklos@...redi.hu>,
	"Myklebust, Trond" <Trond.Myklebust@...app.com>,
	Zach Brown <zab@...hat.com>,
	Anna Schumaker <schumaker.anna@...il.com>,
	Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Linux-Fsdevel <linux-fsdevel@...r.kernel.org>,
	"linux-nfs@...r.kernel.org" <linux-nfs@...r.kernel.org>,
	"Schumaker, Bryan" <Bryan.Schumaker@...app.com>,
	"Martin K. Petersen" <mkp@....net>, Jens Axboe <axboe@...nel.dk>,
	Mark Fasheh <mfasheh@...e.com>,
	Joel Becker <jlbec@...lplan.org>,
	Eric Wong <normalperson@...t.net>
Subject: Re: [RFC] extending splice for copy offloading

On 09/30/2013 10:34 AM, J. Bruce Fields wrote:
> On Mon, Sep 30, 2013 at 02:20:30PM +0200, Miklos Szeredi wrote:
>> On Sat, Sep 28, 2013 at 11:20 PM, Ric Wheeler <rwheeler@...hat.com> wrote:
>>
>>>>> I don't see the safety argument very compelling either.  There are real
>>>>> semantic differences, however: ENOSPC on a write to a
>>>>> (apparentlíy) already allocated block.  That could be a bit unexpected.
>>>>> Do we
>>>>> need a fallocate extension to deal with shared blocks?
>>>> The above has been the case for all enterprise storage arrays ever since
>>>> the invention of snapshots. The NFSv4.2 spec does allow you to set a
>>>> per-file attribute that causes the storage server to always preallocate
>>>> enough buffers to guarantee that you can rewrite the entire file, however
>>>> the fact that we've lived without it for said 20 years leads me to believe
>>>> that demand for it is going to be limited. I haven't put it top of the list
>>>> of features we care to implement...
>>>>
>>>> Cheers,
>>>>      Trond
>>>
>>> I agree - this has been common behaviour for a very long time in the array
>>> space. Even without an array,  this is the same as overwriting a block in
>>> btrfs or any file system with a read-write LVM snapshot.
>> Okay, I'm convinced.
>>
>> So I suggest
>>
>>   - mount(..., MNT_REFLINK): *allow* splice to reflink.  If this is not
>> set, fall back to page cache copy.
>>   - splice(... SPLICE_REFLINK):  fail non-reflink copy.  With this app
>> can force reflink.
>>
>> Both are trivial to implement and make sure that no backward
>> incompatibility surprises happen.
>>
>> My other worry is about interruptibility/restartability.  Ideas?
>>
>> What happens on splice(from, to, 4G) and it's a non-reflink copy?
>> Can the page cache copy be made restartable?   Or should splice() be
>> allowed to return a short count?  What happens on (non-reflink) remote
>> copies and huge request sizes?
> If I were writing an application that required copies to be restartable,
> I'd probably use the largest possible range in the reflink case but
> break the copy into smaller chunks in the splice case.
>
> For that reason I don't like the idea of a mount option--the choice is
> something that the application probably wants to make (or at least to
> know about).
>
> The NFS COPY operation, as specified in current drafts, allows for
> asynchronous copies but leaves the state of the file undefined in the
> case of an aborted COPY.  I worry that agreeing on standard behavior in
> the case of an abort might be difficult.
>
> --b.

I think that this is still confusing - reflink and array copy offload should not 
be differentiated.  In effect, they should often be the same order of magnitude 
in performance and possibly even use the same or very similar techniques (just 
on different sides of the initiator/target transaction!).

It is much simpler to let the application fail if the offload (or reflink) is 
not supported and let it do the traditional copy offload.  Then you always send 
the largest possible offload operation and do whatever you do now if that fails.

thanks!

Ric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/