[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <de54c406-9270-4145-ab96-5fc3dd51765e@kernel.org>
Date: Tue, 25 Jun 2024 06:55:29 +0900
From: Damien Le Moal <dlemoal@...nel.org>
To: Bart Van Assche <bvanassche@....org>,
Nitesh Shetty <nj.shetty@...sung.com>, Christoph Hellwig <hch@....de>
Cc: Jens Axboe <axboe@...nel.dk>, Jonathan Corbet <corbet@....net>,
Alasdair Kergon <agk@...hat.com>, Mike Snitzer <snitzer@...nel.org>,
Mikulas Patocka <mpatocka@...hat.com>, Keith Busch <kbusch@...nel.org>,
Sagi Grimberg <sagi@...mberg.me>, Chaitanya Kulkarni <kch@...dia.com>,
Alexander Viro <viro@...iv.linux.org.uk>,
Christian Brauner <brauner@...nel.org>, Jan Kara <jack@...e.cz>,
martin.petersen@...cle.com, david@...morbit.com, hare@...e.de,
damien.lemoal@...nsource.wdc.com, anuj20.g@...sung.com, joshi.k@...sung.com,
nitheshshetty@...il.com, gost.dev@...sung.com, linux-block@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-doc@...r.kernel.org,
dm-devel@...ts.linux.dev, linux-nvme@...ts.infradead.org,
linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH v20 02/12] Add infrastructure for copy offload in block
and request layer.
On 2024/06/25 1:25, Bart Van Assche wrote:
> On 6/24/24 3:44 AM, Nitesh Shetty wrote:
>> For reference, I have listed the approaches we have taken in the past.
>>
>> a. Token/payload based approach:
>> 1. Here we allocate a buffer/payload.
>> 2. First source BIO is sent along with the buffer.
>> 3. Once the buffer reaches driver, it is filled with the source LBA
>> and length and namespace info. And the request is completed.
>> 4. Then destination BIO is sent with same buffer.
>> 5. Once the buffer reaches driver, it retrieves the source information from
>> the BIO and forms a copy command and sends it down to device.
>>
>> We received feedback that putting anything inside payload which is not
>> data, is not a good idea[1].
>
> A token-based approach (pairing copy_src and copy_dst based on a token)
> is completely different from a payload-based approach (copy offload
> parameters stored in the bio payload). From [1] (I agree with what has
> been quoted): "In general every time we tried to come up with a request
> payload that is not just data passed to the device it has been a
> nightmare." [ ... ] "The only thing we'd need is a sequence number / idr
> / etc to find an input and output side match up, as long as we
> stick to the proper namespace scope."
>
>> c. List/ctx based approach:
>> A new member is added to bio, bio_copy_ctx, which will a union with
>> bi_integrity. Idea is once a copy bio reaches blk_mq_submit_bio, it will
>> add the bio to this list.
>> 1. Send the destination BIO, once this reaches blk_mq_submit_bio, this
>> will add the destination BIO to the list inside bi_copy_ctx and return
>> without forming any request.
>> 2. Send source BIO, once this reaches blk_mq_submit_bio, this will
>> retrieve the destination BIO from bi_copy_ctx and form a request with
>> destination BIO and source BIO. After this request will be sent to
>> driver.
>>
>> This work is still in POC phase[2]. But this approach makes lifetime
>> management of BIO complicated, especially during failure cases.
>
> Associating src and dst operations by embedding a pointer to a third
> data structure in struct bio is an implementation choice and is not the
> only possibility for assocating src and dst operations. Hence, the
> bio lifetime complexity mentioned above is not inherent to the list
> based approach but is a result of the implementation choice made for
> associating src and dst operations.
>
> Has it been considered to combine the list-based approach for managing
> unpaired copy operations with the token based approach for pairing copy
> src and copy dst operations?
I am still a little confused as to why we need 2 BIOs, one for src and one for
dst... Is it because of the overly complex scsi extended copy support ?
Given that the main use case is copy offload for data within the same device,
using a single BIO which somehow can carry a list of LBA sources and a single
destination LBA would be far simpler and perfectly matching nvme simple copy and
ATA write gathered. And I think that this would also match the simplest case for
scsi extended copy as well.
--
Damien Le Moal
Western Digital Research
Powered by blists - more mailing lists