[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20230606223752.65dd725c04b11346b45e0546@kernel.org>
Date: Tue, 6 Jun 2023 22:37:52 +0900
From: Masami Hiramatsu (Google) <mhiramat@...nel.org>
To: Beau Belgrave <beaub@...ux.microsoft.com>
Cc: Christian Brauner <brauner@...nel.org>,
Alexei Starovoitov <alexei.starovoitov@...il.com>,
Steven Rostedt <rostedt@...dmis.org>,
Masami Hiramatsu <mhiramat@...nel.org>,
LKML <linux-kernel@...r.kernel.org>,
linux-trace-kernel@...r.kernel.org,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Andrii Nakryiko <andrii@...nel.org>, bpf <bpf@...r.kernel.org>,
David Vernet <void@...ifault.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Dave Thaler <dthaler@...rosoft.com>,
Christoph Hellwig <hch@...radead.org>
Subject: Re: [PATCH] tracing/user_events: Run BPF program if attached
Hi Beau,
On Thu, 1 Jun 2023 09:29:21 -0700
Beau Belgrave <beaub@...ux.microsoft.com> wrote:
> > > These are stubs to integrate namespace support. I've been working on a
> > > series that adds a tracing namespace support similiar to the IMA
> > > namespace work [1]. That series is ending up taking more time than I
> >
> > Look, this is all well and nice but you've integrated user events with
> > tracefs. This is currently a single-instance global filesystem. So what
> > you're effectively implying is that you're namespacing tracefs by
> > hanging it off of struct user namespace making it mountable by
> > unprivileged users. Or what's the plan?
> >
>
> We don't have plans for unprivileged users currently. I think that is a
> great goal and requires a proper tracing namespace, which we currently
> don't have. I've done some thinking on this, but I would like to hear
> your thoughts and others on how to do this properly. We do talk about
> this in the tracefs meetings (those might be out of your time zone
> unfortunately).
>
> > That alone is massive work with _wild_ security implications. My
> > appetite for exposing more stuff under user namespaces is very low given
> > the amount of CVEs we've had over the years.
> >
>
> Ok, I based that approach on the feedback given in LPC 2022 - Containers
> and Checkpoint/Retore MC [1]. I believe you gave feedback to use user
> namespaces to provide the encapsulation that was required :)
Even with the user namespace, I think we still need to provide separate
"eventname-space" for each application, since it may depend on the context
who and where it is launched. I think the easiest solution is (perhaps)
providing a PID-based new groups for each instance (the PID-prefix or
suffix will be hidden from the application).
I think it may not good to allow unprivileged user processes to detect
the registered event name each other by default.
>
> > > anticipated.
> >
> > Yet you were confident enough to leave the namespacing stubs for this
> > functionality in the code. ;)
> >
> > What is the overall goal here? Letting arbitrary unprivileged containers
> > define their own custom user event type by mounting tracefs inside
> > unprivileged containers? If so, what security story is going to
> > guarantee that writing arbitrary tracepoints from random unprivileged
> > containers is safe?
> >
>
> Unprivileged containers is not a goal, however, having a per-pod
> user_event system name, such as user_event_<pod_name>, would be ideal
> for certain diagnostic scenarios, such as monitoring the entire pod.
That can be done in the user-space tools, not in the kernel.
> When you have a lot of containers, you also want to limit how many
> tracepoints each container can create, even if they are given access to
> the tracefs file. The per-group can limit how many events/tracepoints
> that container can go create, since we currently only have 16-bit
> identifiers for trace_event's we need to be cautious we don't run out.
I agree, we need to have a knob to limit it to avoid DoS attack.
> user_events in general has tracepoint validators to ensure the payloads
> coming in are "safe" from what the kernel might do with them, such as
> filtering out data.
[...]
> > > changing the system name of user_events on a per-namespace basis.
> >
> > What is the "system name" and how does it protect against namespaces
> > messing with each other?
>
> trace_events in the tracing facility require both a system name and an
> event name. IE: sched/sched_waking, sched is the system name,
> sched_waking is the event name. For user_events in the root group, the
> system name is "user_events". When groups are introduced, the system
> name can be "user_events_<GUID>" for example.
So my suggestion is using PID in root pid namespace instead of GUID
by default.
Thank you,
--
Masami Hiramatsu (Google) <mhiramat@...nel.org>
Powered by blists - more mailing lists