[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAG48ez2PeV8WiMdHakqTrVHtbO=N8RKtqAq1v6bNbNOMdMJ9QA@mail.gmail.com>
Date: Tue, 28 Nov 2023 17:36:34 +0100
From: Jann Horn <jannh@...gle.com>
To: Jens Axboe <axboe@...nel.dk>
Cc: Pavel Begunkov <asml.silence@...il.com>,
io-uring <io-uring@...r.kernel.org>,
kernel list <linux-kernel@...r.kernel.org>
Subject: Re: io_uring: risky use of task work, especially wrt fdget()
On Tue, Nov 28, 2023 at 5:19 PM Jens Axboe <axboe@...nel.dk> wrote:
> On 11/28/23 8:58 AM, Jens Axboe wrote:
> > On 11/27/23 2:53 PM, Jann Horn wrote:
> >> Hi!
> >>
> >> I noticed something that I think does not currently cause any
> >> significant security issues, but could be problematic in the future:
> >>
> >> io_uring sometimes processes task work in the middle of syscalls,
> >> including between fdget() and fdput(). My understanding of task work
> >> is that it is expected to run in a context similar to directly at
> >> syscall entry/exit: task context, no locks held, sleeping is okay, and
> >> it doesn't execute in the middle of some syscall that expects private
> >> state of the task_struct to stay the same.
> >>
> >> An example of another user of task work is the keyring subsystem,
> >> which does task_work_add() in keyctl_session_to_parent() to change the
> >> cred pointers of another task.
> >>
> >> Several places in io_uring process task work while holding an fdget()
> >> reference to some file descriptor. For example, the io_uring_enter
> >> syscall handler calls io_iopoll_check() while the io_ring_ctx is only
> >> referenced via fdget(). This means that if there were another kernel
> >> subsystem that uses task work to close file descriptors, io_uring
> >> would become unsafe. And io_uring does _almost_ that itself, I think:
> >> io_queue_worker_create() can be run on a workqueue, and uses task work
> >> to launch a worker thread from the context of a userspace thread; and
> >> this worker thread can then accept commands to close file descriptors.
> >> Except it doesn't accept commands to close io_uring file descriptors.
> >>
> >> A closer miss might be io_sync_cancel(), which holds a reference to
> >> some normal file with fdget()/fdput() while calling into
> >> io_run_task_work_sig(). However, from what I can tell, the only things
> >> that are actually done with this file pointer are pointer comparisons,
> >> so this also shouldn't have significant security impact.
> >>
> >> Would it make sense to use fget()/fput() instead of fdget()/fdput() in
> >> io_sync_cancel(), io_uring_enter and io_uring_register? These
> >> functions probably usually run in multithreaded environments anyway
> >> (thanks to the io_uring worker threads), so I would think fdget()
> >> shouldn't bring significant performance savings here?
> >
> > Let me run some testing on that. It's a mistake to think that it's
> > usually multithreaded, generally if you end up using io-wq then it's not
> > a fast path. A fast networked setup, for example, would never touch the
> > threads and hence no threading would be implied by using io_uring. Ditto
> > on the storage front, if you're just reading/writing or eg doing polled
> > IO. That said, those workloads are generally threaded _anyway_ - not
> > because of io_uring, but because that's how these kinds of workloads are
> > written to begin with.
Aah, because with polled I/O, when the fd is signalled as ready, the
actual execution of work is done via task_work? Thanks for the
explanation, I missed that.
> > So probably won't be much of a concern to do the swap. The only
> > "interesting" part of the above mix of cancel/register/enter is
> > obviously the enter part. The rest are not really fast path.
>
> Did all three and ran the usual testing, which just so happens to be
> multithreaded to begin with anyway. No discernable change from using
> fget/fput over fdget/fdput.
Oh, nice!
> IOW, we may as well do this. Do you want to send a patch? Or I can send
> out mine, up to you.
Ah, if you already cooked up a patch to do the testing, I guess it's
easier if you commit that?
Powered by blists - more mailing lists