[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YoKmFYjIe1AWk/P8@stefanha-x1.localdomain>
Date: Mon, 16 May 2022 20:29:25 +0100
From: Stefan Hajnoczi <stefanha@...hat.com>
To: ming.lei@...hat.com
Cc: Jens Axboe <axboe@...nel.dk>, linux-block@...r.kernel.org,
linux-kernel@...r.kernel.org, io-uring@...r.kernel.org,
Gabriel Krisman Bertazi <krisman@...labora.com>,
ZiyangZhang <ZiyangZhang@...ux.alibaba.com>,
Xiaoguang Wang <xiaoguang.wang@...ux.alibaba.com>,
kwolf@...hat.com, sgarzare@...hat.com
Subject: Re: [RFC PATCH] ubd: add io_uring based userspace block driver
Hi,
This looks interesting! I have some questions:
1. What is the ubdsrv permission model?
A big usability challenge for *-in-userspace interfaces is the balance
between security and allowing unprivileged processes to use these
features.
- Does /dev/ubd-control need to be privileged? I guess the answer is
yes since an evil ubdsrv can hang I/O and corrupt data in hopes of
triggering file system bugs.
- Can multiple processes that don't trust each other use UBD at the same
time? I guess not since ubd_index_idr is global.
- What about containers and namespaces? They currently have (write)
access to the same global ubd_index_idr.
- Maybe there should be a struct ubd_device "owner" (struct
task_struct *) so only devices created by the current process can be
modified?
2. io_uring_cmd design
The rationale for the io_uring_cmd design is not explained in the cover
letter. I think it's worth explaining the design. Here are my guesses:
The same thing can be achieved with just file_operations and io_uring.
ubdsrv could read I/O submissions with IORING_OP_READ and write I/O
completions with IORING_OP_WRITE. That would require 2 sqes per
roundtrip instead of 1, but the same number of io_uring_enter(2) calls
since multiple sqes/cqes can be batched per syscall:
- IORING_OP_READ, addr=(struct ubdsrv_io_desc*) (for submission)
- IORING_OP_WRITE, addr=(struct ubdsrv_io_cmd*) (for completion)
Both operations require a copy_to/from_user() to access the command
metadata.
The io_uring_cmd approach works differently. The IORING_OP_URING_CMD sqe
carries a 40-byte payload so it's possible to embed struct ubdsrv_io_cmd
inside it. The struct ubdsrv_io_desc mmap gets around the fact that
io_uring cqes contain no payload. The driver therefore needs a
side-channel to transfer the request submission details to ubdsrv. I
don't see much of a difference between IORING_OP_READ and the mmap
approach though.
It's not obvious to me how much more efficient the io_uring_cmd approach
is, but taking fewer trips around the io_uring submission/completion
code path is likely to be faster. Something similar can be done with
file_operations ->ioctl(), but I guess the point of using io_uring is
that is composes. If ubdsrv itself wants to use io_uring for other I/O
activity (e.g. networking, disk I/O, etc) then it can do so and won't be
stuck in a blocking ioctl() syscall.
It would be nice if you could write 2 or 3 paragraphs explaining why the
io_uring_cmd design and the struct ubdsrv_io_desc mmap was chosen.
3. Miscellaneous stuff
- There isn't much in the way of memory ordering in the code. I worry a
little that changes to the struct ubdsrv_io_desc mmap may not be
visible at the expected time with respect to the io_uring cq ring.
Thanks,
Stefan
Download attachment "signature.asc" of type "application/pgp-signature" (489 bytes)
Powered by blists - more mailing lists