[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9c833e12-fd09-fe7d-d4f2-e916c6ce4524@kernel.dk>
Date: Mon, 9 May 2022 10:09:10 -0600
From: Jens Axboe <axboe@...nel.dk>
To: Ming Lei <ming.lei@...hat.com>
Cc: linux-block@...r.kernel.org, linux-kernel@...r.kernel.org,
io-uring@...r.kernel.org,
Gabriel Krisman Bertazi <krisman@...labora.com>,
ZiyangZhang <ZiyangZhang@...ux.alibaba.com>,
Xiaoguang Wang <xiaoguang.wang@...ux.alibaba.com>
Subject: Re: [RFC PATCH] ubd: add io_uring based userspace block driver
On 5/9/22 3:23 AM, Ming Lei wrote:
> This is the driver part of userspace block driver(ubd driver), the other
> part is userspace daemon part(ubdsrv)[1].
>
> The two parts communicate by io_uring's IORING_OP_URING_CMD with one
> shared cmd buffer for storing io command, and the buffer is read only for
> ubdsrv, each io command is indexed by io request tag directly, and
> is written by ubd driver.
>
> For example, when one READ io request is submitted to ubd block driver, ubd
> driver stores the io command into cmd buffer first, then completes one
> IORING_OP_URING_CMD for notifying ubdsrv, and the URING_CMD is issued to
> ubd driver beforehand by ubdsrv for getting notification of any new io request,
> and each URING_CMD is associated with one io request by tag.
>
> After ubdsrv gets the io command, it translates and handles the ubd io
> request, such as, for the ubd-loop target, ubdsrv translates the request
> into same request on another file or disk, like the kernel loop block
> driver. In ubdsrv's implementation, the io is still handled by io_uring,
> and share same ring with IORING_OP_URING_CMD command. When the target io
> request is done, the same IORING_OP_URING_CMD is issued to ubd driver for
> both committing io request result and getting future notification of new
> io request.
>
> Another thing done by ubd driver is to copy data between kernel io
> request and ubdsrv's io buffer:
>
> 1) before ubsrv handles WRITE request, copy the request's data into
> ubdsrv's userspace io buffer, so that ubdsrv can handle the write
> request
>
> 2) after ubsrv handles READ request, copy ubdsrv's userspace io buffer
> into this READ request, then ubd driver can complete the READ request
>
> Zero copy may be switched if mm is ready to support it.
>
> ubd driver doesn't handle any logic of the specific user space driver,
> so it should be small/simple enough.
This is pretty interesting! Just one small thing I noticed, since you
want to make sure batching is Good Enough:
> +static blk_status_t ubd_queue_rq(struct blk_mq_hw_ctx *hctx,
> + const struct blk_mq_queue_data *bd)
> +{
> + struct ubd_queue *ubq = hctx->driver_data;
> + struct request *rq = bd->rq;
> + struct ubd_io *io = &ubq->ios[rq->tag];
> + struct ubd_rq_data *data = blk_mq_rq_to_pdu(rq);
> + blk_status_t res;
> +
> + if (ubq->aborted)
> + return BLK_STS_IOERR;
> +
> + /* this io cmd slot isn't active, so have to fail this io */
> + if (WARN_ON_ONCE(!(io->flags & UBD_IO_FLAG_ACTIVE)))
> + return BLK_STS_IOERR;
> +
> + /* fill iod to slot in io cmd buffer */
> + res = ubd_setup_iod(ubq, rq);
> + if (res != BLK_STS_OK)
> + return BLK_STS_IOERR;
> +
> + blk_mq_start_request(bd->rq);
> +
> + /* mark this cmd owned by ubdsrv */
> + io->flags |= UBD_IO_FLAG_OWNED_BY_SRV;
> +
> + /*
> + * clear ACTIVE since we are done with this sqe/cmd slot
> + *
> + * We can only accept io cmd in case of being not active.
> + */
> + io->flags &= ~UBD_IO_FLAG_ACTIVE;
> +
> + /*
> + * run data copy in task work context for WRITE, and complete io_uring
> + * cmd there too.
> + *
> + * This way should improve batching, meantime pinning pages in current
> + * context is pretty fast.
> + */
> + task_work_add(ubq->ubq_daemon, &data->work, TWA_SIGNAL);
> +
> + return BLK_STS_OK;
> +}
It'd be better to use bd->last to indicate what kind of signaling you
need here. TWA_SIGNAL will force an immediate transition if the app is
running in userspace, which may not be what you want. Also see:
https://git.kernel.dk/cgit/linux-block/commit/?h=for-5.19/io_uring&id=e788be95a57a9bebe446878ce9bf2750f6fe4974
But regardless of signaling needed, you don't need it except if bd->last
is true. Would need a commit_rqs() as well, but that's trivial.
More importantly, what prevents ubq->ubq_daemon from going away after
it's been assigned? I didn't look at the details, but is this relying on
io_uring being closed to cancel pending requests? That should work, but
we need some way to ensure that ->ubq_daemon is always valid here.
--
Jens Axboe
Powered by blists - more mailing lists