linux-kernel - Re: [RFC PATCH] io_uring: add support for IORING_OP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3757f227-6ed9-b140-e367-69387f966874@gmail.com>
Date:   Sun, 15 Dec 2019 18:40:02 +0300
From:   Pavel Begunkov <asml.silence@...il.com>
To:     Jens Axboe <axboe@...nel.dk>, Jann Horn <jannh@...gle.com>
Cc:     io-uring <io-uring@...r.kernel.org>,
        kernel list <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH] io_uring: add support for IORING_OP_IOCTL

On 14/12/2019 21:52, Jens Axboe wrote:
> On 12/14/19 10:56 AM, Pavel Begunkov wrote:
>>
>> On 14/12/2019 20:12, Jann Horn wrote:
>>> On Sat, Dec 14, 2019 at 4:30 PM Pavel Begunkov <asml.silence@...il.com> wrote:
>>>> This works almost like ioctl(2), except it doesn't support a bunch of
>>>> common opcodes, (e.g. FIOCLEX and FIBMAP, see ioctl.c), and goes
>>>> straight to a device specific implementation.
>>>>
>>>> The case in mind is dma-buf, drm and other ioctl-centric interfaces.
>>>>
>>>> Not-yet Signed-off-by: Pavel Begunkov <asml.silence@...il.com>
>>>> ---
>>>>
>>>> It clearly needs some testing first, though works fine with dma-buf,
>>>> but I'd like to discuss whether the use cases are convincing enough,
>>>> and is it ok to desert some ioctl opcodes. For the last point it's
>>>> fairly easy to add, maybe except three requiring fd (e.g. FIOCLEX)
>>>>
>>>> P.S. Probably, it won't benefit enough to consider using io_uring
>>>> in drm/mesa, but anyway.
>>> [...]
>>>> +static int io_ioctl(struct io_kiocb *req,
>>>> +                   struct io_kiocb **nxt, bool force_nonblock)
>>>> +{
>>>> +       const struct io_uring_sqe *sqe = req->sqe;
>>>> +       unsigned int cmd = READ_ONCE(sqe->ioctl_cmd);
>>>> +       unsigned long arg = READ_ONCE(sqe->ioctl_arg);
>>>> +       int ret;
>>>> +
>>>> +       if (!req->file)
>>>> +               return -EBADF;
>>>> +       if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL))
>>>> +               return -EINVAL;
>>>> +       if (unlikely(sqe->ioprio || sqe->addr || sqe->buf_index
>>>> +               || sqe->rw_flags))
>>>> +               return -EINVAL;
>>>> +       if (force_nonblock)
>>>> +               return -EAGAIN;
>>>> +
>>>> +       ret = security_file_ioctl(req->file, cmd, arg);
>>>> +       if (!ret)
>>>> +               ret = (int)vfs_ioctl(req->file, cmd, arg);
>>>
>>> This isn't going to work. For several of the syscalls that were added,
>>> special care had to be taken to avoid bugs - like for RECVMSG, for the
>>> upcoming OPEN/CLOSE stuff, and so on.
>>>
>>> And in principle, ioctls handlers can do pretty much all of the things
>>> syscalls can do, and more. They can look at the caller's PID, they can
>>> open and close (well, technically that's slightly unsafe, but IIRC
>>> autofs does it anyway) things in the file descriptor table, they can
>>> give another process access to the calling process in some way, and so
>>> on. If you just allow calling arbitrary ioctls through io_uring, you
>>> will certainly get bugs, and probably security bugs, too.
>>>
>>> Therefore, I would prefer to see this not happen at all; and if you do
>>> have a usecase where you think the complexity is worth it, then I
>>> think you'll have to add new infrastructure that allows each
>>> file_operations instance to opt in to having specific ioctls called
>>> via this mechanism, or something like that, and ensure that each of
>>> the exposed ioctls only performs operations that are safe from uring
>>> worker context.
>>
>> Sounds like hell of a problem. Thanks for sorting this out!
> 
> While the ioctl approach is tempting, for the use cases where it makes
> sense, I think we should just add a ioctl type opcode and have the
> sub-opcode be somewhere else in the sqe. Because I do think there's
> a large opportunity to expose a fast API that works with ioctl like
> mechanisms. If we have
> 
> IORING_OP_IOCTL
> 
> and set aside an sqe field for the per-driver (or per-user) and
> add a file_operations method for sending these to the fd, then we'll
> have a much better (and faster + async) API than ioctls. We could
> add fops->uring_issue() or something, and that passes the io_kiocb.
> When it completes, the ->io_uring_issue() posts a completion by
> calling io_uring_complete_req() or something.
> 
> Outside of the issues that Jann outlined, ioctls are also such a
> decade old mess that we have to do the -EAGAIN punt for all of them
> like you did in your patch. If it's opt-in like ->uring_issue(), then
> care could be taken to do this right and just have it return -EAGAIN
> if it does need async context.

Right. But there is an overhead within io_uring, small but still. IMHO,
there won't be much merit unless utilising batching/async. From my
perspective, to justify the work there should be such a user (or one
should be created) with a prototype and performance numbers.
Any ideas where to look?

> 
> ret = fops->uring_issue(req, force_nonblock);
> if (ret == -EAGAIN) {
> 	... usual punt ...
> }
> 
> I think working on this would be great, and some of the more performance
> sensitive ioctl cases should flock to it.
> 

-- 
Pavel Begunkov



Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)