[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200722141404.jfzfl3alpyw7o7dw@steredhat>
Date: Wed, 22 Jul 2020 16:14:04 +0200
From: Stefano Garzarella <sgarzare@...hat.com>
To: Daurnimator <quae@...rnimator.com>
Cc: Jens Axboe <axboe@...nel.dk>,
Alexander Viro <viro@...iv.linux.org.uk>,
Kernel Hardening <kernel-hardening@...ts.openwall.com>,
Kees Cook <keescook@...omium.org>,
Aleksa Sarai <asarai@...e.de>,
Stefan Hajnoczi <stefanha@...hat.com>,
Christian Brauner <christian.brauner@...ntu.com>,
Sargun Dhillon <sargun@...gun.me>,
Jann Horn <jannh@...gle.com>,
io-uring <io-uring@...r.kernel.org>,
linux-fsdevel@...r.kernel.org, Jeff Moyer <jmoyer@...hat.com>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH RFC v2 2/3] io_uring: add IOURING_REGISTER_RESTRICTIONS
opcode
On Wed, Jul 22, 2020 at 12:35:15PM +1000, Daurnimator wrote:
> On Wed, 22 Jul 2020 at 03:11, Jens Axboe <axboe@...nel.dk> wrote:
> >
> > On 7/21/20 4:40 AM, Stefano Garzarella wrote:
> > > On Thu, Jul 16, 2020 at 03:26:51PM -0600, Jens Axboe wrote:
> > >> On 7/16/20 6:48 AM, Stefano Garzarella wrote:
> > >>> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
> > >>> index efc50bd0af34..0774d5382c65 100644
> > >>> --- a/include/uapi/linux/io_uring.h
> > >>> +++ b/include/uapi/linux/io_uring.h
> > >>> @@ -265,6 +265,7 @@ enum {
> > >>> IORING_REGISTER_PROBE,
> > >>> IORING_REGISTER_PERSONALITY,
> > >>> IORING_UNREGISTER_PERSONALITY,
> > >>> + IORING_REGISTER_RESTRICTIONS,
> > >>>
> > >>> /* this goes last */
> > >>> IORING_REGISTER_LAST
> > >>> @@ -293,4 +294,30 @@ struct io_uring_probe {
> > >>> struct io_uring_probe_op ops[0];
> > >>> };
> > >>>
> > >>> +struct io_uring_restriction {
> > >>> + __u16 opcode;
> > >>> + union {
> > >>> + __u8 register_op; /* IORING_RESTRICTION_REGISTER_OP */
> > >>> + __u8 sqe_op; /* IORING_RESTRICTION_SQE_OP */
> > >>> + };
> > >>> + __u8 resv;
> > >>> + __u32 resv2[3];
> > >>> +};
> > >>> +
> > >>> +/*
> > >>> + * io_uring_restriction->opcode values
> > >>> + */
> > >>> +enum {
> > >>> + /* Allow an io_uring_register(2) opcode */
> > >>> + IORING_RESTRICTION_REGISTER_OP,
> > >>> +
> > >>> + /* Allow an sqe opcode */
> > >>> + IORING_RESTRICTION_SQE_OP,
> > >>> +
> > >>> + /* Only allow fixed files */
> > >>> + IORING_RESTRICTION_FIXED_FILES_ONLY,
> > >>> +
> > >>> + IORING_RESTRICTION_LAST
> > >>> +};
> > >>> +
> > >>
> > >> Not sure I totally love this API. Maybe it'd be cleaner to have separate
> > >> ops for this, instead of muxing it like this. One for registering op
> > >> code restrictions, and one for disallowing other parts (like fixed
> > >> files, etc).
> > >>
> > >> I think that would look a lot cleaner than the above.
> > >>
> > >
> > > Talking with Stefan, an alternative, maybe more near to your suggestion,
> > > would be to remove the 'struct io_uring_restriction' and add the
> > > following register ops:
> > >
> > > /* Allow an sqe opcode */
> > > IORING_REGISTER_RESTRICTION_SQE_OP
> > >
> > > /* Allow an io_uring_register(2) opcode */
> > > IORING_REGISTER_RESTRICTION_REG_OP
> > >
> > > /* Register IORING_RESTRICTION_* */
> > > IORING_REGISTER_RESTRICTION_OP
> > >
> > >
> > > enum {
> > > /* Only allow fixed files */
> > > IORING_RESTRICTION_FIXED_FILES_ONLY,
> > >
> > > IORING_RESTRICTION_LAST
> > > }
> > >
> > >
> > > We can also enable restriction only when the rings started, to avoid to
> > > register IORING_REGISTER_ENABLE_RINGS opcode. Once rings are started,
> > > the restrictions cannot be changed or disabled.
> >
> > My concerns are largely:
> >
> > 1) An API that's straight forward to use
> > 2) Something that'll work with future changes
> >
> > The "allow these opcodes" is straightforward, and ditto for the register
> > opcodes. The fixed file I guess is the odd one out. So if we need to
> > disallow things in the future, we'll need to add a new restriction
> > sub-op. Should this perhaps be "these flags must be set", and that could
> > easily be augmented with "these flags must not be set"?
> >
> > --
> > Jens Axboe
> >
>
> This is starting to sound a lot like seccomp filtering.
> Perhaps we should go straight to adding a BPF hook that fires when
> reading off the submission queue?
>
You're right. I e-mailed about that whit Kees Cook [1] and he agreed that the
restrictions in io_uring should allow us to address some issues that with
seccomp it's a bit difficult. For example:
- different restrictions for different io_uring instances in the same
process
- limit SQEs to use only registered fds and buffers
Maybe seccomp could take advantage of the restrictions to filter SQEs opcodes.
Thanks,
Stefano
[1] https://lore.kernel.org/io-uring/202007160751.ED56C55@keescook/
Powered by blists - more mailing lists