[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 3 Mar 2020 11:13:50 +0100
From: Miklos Szeredi <miklos@...redi.hu>
To: Christian Brauner <christian.brauner@...ntu.com>
Cc: David Howells <dhowells@...hat.com>, Ian Kent <raven@...maw.net>,
James Bottomley <James.Bottomley@...senpartnership.com>,
Steven Whitehouse <swhiteho@...hat.com>,
Miklos Szeredi <mszeredi@...hat.com>,
viro <viro@...iv.linux.org.uk>,
Christian Brauner <christian@...uner.io>,
Jann Horn <jannh@...gle.com>,
"Darrick J. Wong" <darrick.wong@...cle.com>,
Linux API <linux-api@...r.kernel.org>,
linux-fsdevel <linux-fsdevel@...r.kernel.org>,
lkml <linux-kernel@...r.kernel.org>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Subject: Re: [PATCH 00/17] VFS: Filesystem information and notifications [ver #17]
On Tue, Mar 3, 2020 at 11:00 AM Christian Brauner
<christian.brauner@...ntu.com> wrote:
>
> On Tue, Mar 03, 2020 at 10:26:21AM +0100, Miklos Szeredi wrote:
> > On Tue, Mar 3, 2020 at 10:13 AM David Howells <dhowells@...hat.com> wrote:
> > >
> > > Miklos Szeredi <miklos@...redi.hu> wrote:
> > >
> > > > I'm doing a patch. Let's see how it fares in the face of all these
> > > > preconceptions.
> > >
> > > Don't forget the efficiency criterion. One reason for going with fsinfo(2) is
> > > that scanning /proc/mounts when there are a lot of mounts in the system is
> > > slow (not to mention the global lock that is held during the read).
> > >
> > > Now, going with sysfs files on top of procfs links might avoid the global
> > > lock, and you can avoid rereading the options string if you export a change
> > > notification, but you're going to end up injecting a whole lot of pathwalk
> > > latency into the system.
> >
> > Completely irrelevant. Cached lookup is so much optimized, that you
> > won't be able to see any of it.
> >
> > No, I don't think this is going to be a performance issue at all, but
> > if anything we could introduce a syscall
> >
> > ssize_t readfile(int dfd, const char *path, char *buf, size_t
> > bufsize, int flags);
> >
> > that is basically the equivalent of open + read + close, or even a
> > vectored variant that reads multiple files. But that's off topic
> > again, since I don't think there's going to be any performance issue
> > even with plain I/O syscalls.
> >
> > >
> > > On top of that, it isn't going to help with the case that I'm working towards
> > > implementing where a container manager can monitor for mounts taking place
> > > inside the container and supervise them. What I'm proposing is that during
> > > the action phase (eg. FSCONFIG_CMD_CREATE), fsconfig() would hand an fd
> > > referring to the context under construction to the manager, which would then
> > > be able to call fsinfo() to query it and fsconfig() to adjust it, reject it or
> > > permit it. Something like:
> > >
> > > fd = receive_context_to_supervise();
> > > struct fsinfo_params params = {
> > > .flags = FSINFO_FLAGS_QUERY_FSCONTEXT,
> > > .request = FSINFO_ATTR_SB_OPTIONS,
> > > };
> > > fsinfo(fd, NULL, ¶ms, sizeof(params), buffer, sizeof(buffer));
> > > supervise_parameters(buffer);
> > > fsconfig(fd, FSCONFIG_SET_FLAG, "hard", NULL, 0);
> > > fsconfig(fd, FSCONFIG_SET_STRING, "vers", "4.2", 0);
> > > fsconfig(fd, FSCONFIG_CMD_SUPERVISE_CREATE, NULL, NULL, 0);
> > > struct fsinfo_params params = {
> > > .flags = FSINFO_FLAGS_QUERY_FSCONTEXT,
> > > .request = FSINFO_ATTR_SB_NOTIFICATIONS,
> > > };
> > > struct fsinfo_sb_notifications sbnotify;
> > > fsinfo(fd, NULL, ¶ms, sizeof(params), &sbnotify, sizeof(sbnotify));
> > > watch_super(fd, "", AT_EMPTY_PATH, watch_fd, 0x03);
> > > fsconfig(fd, FSCONFIG_CMD_SUPERVISE_PERMIT, NULL, NULL, 0);
> > > close(fd);
> > >
> > > However, the supervised mount may be happening in a completely different set
> > > of namespaces, in which case the supervisor presumably wouldn't be able to see
> > > the links in procfs and the relevant portions of sysfs.
> >
> > It would be a "jump" link to the otherwise invisible directory.
>
> More magic links to beam you around sounds like a bad idea. We had a
> bunch of CVEs around them in containers and they were one of the major
> reasons behind us pushing for openat2(). That's why it has a
> RESOLVE_NO_MAGICLINKS flag.
No, that link wouldn't beam you around at all, it would end up in an
internally mounted instance of a mountfs, a safe place where no
dangerous CVE's roam.
Thanks,
Miklos
Powered by blists - more mailing lists