[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y9KCkuGqyr5T13XN@example.org>
Date: Thu, 26 Jan 2023 14:39:30 +0100
From: Alexey Gladkov <legion@...nel.org>
To: Christian Brauner <brauner@...nel.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
LKML <linux-kernel@...r.kernel.org>, containers@...ts.linux.dev,
linux-fsdevel@...r.kernel.org,
Alexey Dobriyan <adobriyan@...il.com>,
Al Viro <viro@...iv.linux.org.uk>,
Val Cowan <vcowan@...hat.com>
Subject: Re: [RFC PATCH v1 0/6] proc: Add allowlist for procfs files
On Thu, Jan 26, 2023 at 11:16:07AM +0100, Christian Brauner wrote:
> On Wed, Jan 25, 2023 at 03:36:28PM -0800, Andrew Morton wrote:
> > On Wed, 25 Jan 2023 16:28:47 +0100 Alexey Gladkov <legion@...nel.org> wrote:
> >
> > > The patch expands subset= option. If the proc is mounted with the
> > > subset=allowlist option, the /proc/allowlist file will appear. This file
> > > contains the filenames and directories that are allowed for this
> > > mountpoint. By default, /proc/allowlist contains only its own name.
> > > Changing the allowlist is possible as long as it is present in the
> > > allowlist itself.
> > >
> > > This allowlist is applied in lookup/readdir so files that will create
> > > modules after mounting will not be visible.
> > >
> > > Compared to the previous patches [1][2], I switched to a special virtual
> > > file from listing filenames in the mount options.
> > >
> >
> > Changlog doesn't explain why you think Linux needs this feature. The
> > [2/6] changelog hints that containers might be involved. IOW, please
> > fully describe the requirement and use-case(s).
> >
> > Also, please describe why /proc/allowlist is made available via a mount
> > option, rather than being permanently present.
> >
> > And why add to subset=, instead of a separate mount option.
> >
> > Does /proc/allowlist work in subdirectories? Like, permit presence of
> > /proc/sys/vm/compact_memory?
> >
> > I think the whole thing is misnamed, really. "allowlist" implies
> > access permissions. Some of the test here uses "visibility" and other
> > places use "presence", which are better. "presentlist" and
> > /proc/presentlist might be better. But why not simply /proc/contents?
>
> Currently, a lot of container runtimes - even if they mount a new procfs
> instance - overmount various procfs files and directories to ensure that
> they're hidden from the container workload. (The motivations for this
> are mixed and usually it's only needed for containers that run with the
> same privilege level as the host.)
>
> The consequence of overmounting is that we need to refuse mounting
> procfs again somewhere else otherwise the procfs instance might reveal
> files and directories that were supposed to be hidden.
>
> So this patchset moves the ability to hide entries into the kernel
> through an allowlist. This way you can hide files and directories while
> being able to mount procfs again because it will inherit the same
> allowlist.
>
> I get the motivation. The question is whether this belongs into the
> kernel at all. I'm unfortunately not convinced.
>
> This adds a lot of string parsing to procfs and I think we would also
> need to decide what a reasonable maximum limit for such allowlists would
> be.> The data structure likely shouldn't be a linked list but at least an
> rbtree especially if the size isn't limited.
There is a limit. So far I've limited the file size to 128k. I think this
is a reasonable limit.
> But fundamentally I think it moves something that should be and
> currently is a userspace policy into the kernel which I think is wrong.
We don't have mechanisms to implement this userspace policy. overmount is
not a solution but plugging holes in the absence of other ways to control
the visibility of files in procfs.
> Sure you can't predict what files show up in procfs over time but then
> subset=pid is already your friend - even if not as fine-grained.
>
> If this where another simple subset style mount option that allowlists a
> bunch of well-known global proc files then sure. But making this
> dynamically configurable from userspace doesn't make sense to me. I
> mean, users could write /gobble/dy/gook into /proc/allowlist or use it
> to stash secrets or hashes or whatever as we have no way of figuring out
> whether the entry they allowlist does or will actually ever exist.
BTW I only allow printable data to be written to the file.
We can make this file write-only and then writing any extraneous data
there will not make sense.
> In general, such flexibility belongs into userspace imho.
>
> Frankly, if that is really required it would almost make more sense to
> be able to attach a new bpf program type to procfs that would allow to
> filter procfs entries. Then the filter could be done purely in
> userspace. If signed bpf lands one could then even ship signed programs
> that are attachable by userns root.
I'll ask the podman developers how much more comfortable they would be
using bpf to control file visibility in procfs. thanks for the idea.
--
Rgrds, legion
Powered by blists - more mailing lists