[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20210322105325.313fcf1b0232612d9047db04@virtuozzo.com>
Date: Mon, 22 Mar 2021 10:53:25 +0300
From: Alexander Mikhalitsyn <alexander.mikhalitsyn@...tuozzo.com>
To: Alexander Mikhalitsyn <alexander.mikhalitsyn@...tuozzo.com>
Cc: Al Viro <viro@...iv.linux.org.uk>, Ian Kent <raven@...maw.net>,
Matthew Wilcox <willy@...radead.org>,
Pavel Tikhomirov <ptikhomirov@...tuozzo.com>,
Kirill Tkhai <ktkhai@...tuozzo.com>, autofs@...r.kernel.org,
linux-kernel@...r.kernel.org, Miklos Szeredi <mszeredi@...hat.com>,
Christian Brauner <christian.brauner@...ntu.com>,
Ross Zwisler <zwisler@...gle.com>,
Aleksa Sarai <cyphar@...har.com>,
Eric Biggers <ebiggers@...gle.com>,
Mattias Nissler <mnissler@...omium.org>,
linux-fsdevel@...r.kernel.org, alexander@...alicyn.com
Subject: Re: [RFC PATCH] autofs: find_autofs_mount overmounted parent
support
On Tue, 9 Mar 2021 14:31:05 +0300
Alexander Mikhalitsyn <alexander.mikhalitsyn@...tuozzo.com> wrote:
> On Mon, 8 Mar 2021 00:12:22 +0000
> Al Viro <viro@...iv.linux.org.uk> wrote:
>
> > On Sun, Mar 07, 2021 at 11:51:20PM +0000, Al Viro wrote:
> > > On Thu, Mar 04, 2021 at 01:11:33PM +0300, Alexander Mikhalitsyn wrote:
> > >
> > > > That problem connected with CRIU (Checkpoint-Restore in Userspace) project.
> > > > In CRIU we have support of autofs mounts C/R. To acheive that we need to use
> > > > ioctl's from /dev/autofs to get data about mounts, restore mount as catatonic
> > > > (if needed), change pipe fd and so on. But the problem is that during CRIU
> > > > dump we may meet situation when VFS subtree where autofs mount present was
> > > > overmounted as whole.
> > > >
> > > > Simpliest example is /proc/sys/fs/binfmt_misc. This mount present on most
> > > > GNU/Linux distributions by default. For instance on my Fedora 33:
> > > >
> > > > trigger automount of binfmt_misc
> > > > $ ls /proc/sys/fs/binfmt_misc
> > > >
> > > > $ cat /proc/1/mountinfo | grep binfmt
> > > > 35 24 0:36 / /proc/sys/fs/binfmt_misc rw,relatime shared:16 - autofs systemd-1 rw,...,direct,pipe_ino=223
> > > > 632 35 0:56 / /proc/sys/fs/binfmt_misc rw,...,relatime shared:315 - binfmt_misc binfmt_misc rw
> > > >
> > > > $ sudo unshare -m -p --fork --mount-proc sh
> > > > # cat /proc/self/mountinfo | grep "/proc"
> > > > 828 809 0:23 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw
> > > > 829 828 0:36 / /proc/sys/fs/binfmt_misc rw,relatime - autofs systemd-1 rw,...,direct,pipe_ino=223
> > > > 943 829 0:56 / /proc/sys/fs/binfmt_misc rw,...,relatime - binfmt_misc binfmt_misc rw
> > > > 949 828 0:57 / /proc rw...,relatime - proc proc rw
> > > >
> > > > As we can see now autofs mount /proc/sys/fs/binfmt_misc is inaccessible.
> > > > If we do something like:
> > > >
> > > > struct autofs_dev_ioctl *param;
> > > > param = malloc(...);
> > > > devfd = open("/dev/autofs", O_RDONLY);
> > > > init_autofs_dev_ioctl(param);
> > > > param->size = size;
> > > > strcpy(param->path, "/proc/sys/fs/binfmt_misc");
> > > > param->openmount.devid = 36;
> > > > err = ioctl(devfd, AUTOFS_DEV_IOCTL_OPENMOUNT, param)
> > > >
> > > > now we get err = -ENOENT.
> > >
> > Where does that -ENOENT come from? AFAICS, pathwalk ought to succeed and
> > return you the root of overmounting binfmt_misc. Why doesn't the loop in
> > find_autofs_mount() locate anything it would accept?
> >
>
> Consider our mounts tree:
> > > > # cat /proc/self/mountinfo | grep "/proc"
> > > > 828 809 0:23 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw
> > > > 829 828 0:36 / /proc/sys/fs/binfmt_misc rw,relatime - autofs systemd-1 rw,...,direct,pipe_ino=223
> > > > 943 829 0:56 / /proc/sys/fs/binfmt_misc rw,...,relatime - binfmt_misc binfmt_misc rw
> > > > 949 828 0:57 / /proc rw...,relatime - proc proc rw
>
> ENOENT comes from here (current kernel code):
> /* Find the topmost mount satisfying test() */
> static int find_autofs_mount(const char *pathname,
> struct path *res,
> int test(const struct path *path, void *data),
> void *data)
> {
> struct path path;
> int err;
>
> err = kern_path(pathname, LOOKUP_MOUNTPOINT, &path);
> if (err)
> return err;
> <-------- here we successfuly open root dentry (/proc/sys/fs/binfmt_misc) of /proc (mnt_id = 949)
>
> err = -ENOENT;
> <---- set err and start search autofs mount
> /*
> here we use follow_up to move through upper dentries and find overmounted autofs.
> But in our case we opened dentry from /proc (mnt_id = 949) and this concrete dentry is *NOT*
> overmounted (whole /proc overmounted).
> */
> while (path.dentry == path.mnt->mnt_root) {
> if (path.dentry->d_sb->s_magic == AUTOFS_SUPER_MAGIC) {
> if (test(&path, data)) {
> /*
> we never get there
> */
> path_get(&path);
> *res = path;
> err = 0;
> break;
> }
> }
> if (!follow_up(&path))
> break;
> }
> /*
> loop finished. err stays as it was err = -ENOENT
> */
> path_put(&path);
> return err;
> }
>
> Source: https://github.com/torvalds/linux/blob/master/fs/autofs/dev-ioctl.c#L194
>
> > I really dislike the patch - the whole "normalize path" thing is fundamentally
> > bogus, not to mention the iterator over all mounts, etc., so I would like to
> > understand what the hell is going on before even thinking of *not* NAKing
> > it on sight.
>
> I'm not trying to break current API or something similar. I'm just prepared
> RFC patch with my proposal. I'm ready to rework all of that to make it good.
> But without maintainers/community comments/suggestions it's unreal :)
>
> Please, explain what do you mean by "normalize path thing"?
> I'm not changing semantics of current ioctl() I've just trying to extend it to make
> it work in case when parent mount of autofs mount is overmounted.
>
> >
> > Wait, so you have /proc overmounted, without anything autofs-related on
> > /proc/sys/fs/binfmt_misc and still want to have the pathname resolved, just
> > because it would've resolved with that overmount of /proc removed?
>
> Something like that.
>
> 1. I don't expect that /proc/sys/fs/binfmt_misc path will be resolved
> (because, for instance we can overmount /proc by empty tmpfs and in this case after
> overmounting we can't even open /proc/sys/fs/binfmt_misc and that's okay).
>
> 2. We talking about autofs specific function which is used in several autofs-specific
> ioctls. One of that ioctl(AUTOFS_DEV_IOCTL_OPENMOUNT) which is designed to open
> overmounted autofs mounts. Because it's frequent case when autofs mount is overmounted
> (when we talk about direct mounts). This ioctl allows to open file desciptor
> of autofs root dentry and later, autofs daemon use it to manage mount (call another autofs
> ioctls on that fd).
>
> I've just meet problem, that this API not works when parent mount of autofs mount is overmounted.
> For example:
> tmpfs /some-dir
> autofs /some-dir/autofs1 <-autofs direct mount
> nfs /some-dir/autofs1 <-automounted fs on top of autofs
>
> ioctl(AUTOFS_DEV_IOCTL_OPENMOUNT) will work in this case. Because loop
> with follow_up() starts from /some-dir/autofs1 dentry of nfs, then follow_up()
> and move to /some-dir/autofs1 dentry of autofs.
>
> But if we change picture to:
> tmpfs1 /some-dir
> autofs /some-dir/autofs1 <-autofs direct mount
> nfs /some-dir/autofs1 <-automounted fs on top of autofs
> tmpfs2 /some-dir
>
> This will breaks API. Because know we can't even open /some-dir/autofs1
> dentry.
>
> Ok. We can create this dentry at first by mkdir /some-dir/autofs1.
> But it will not help because our loop:
> while (path.dentry == path.mnt->mnt_root) {
> if (path.dentry->d_sb->s_magic == AUTOFS_SUPER_MAGIC) {
> if (test(&path, data)) {
> ...
> if (!follow_up(&path))
> break;
> }
> will start from dentry /some-dir/autofs1 from tmpfs2. And after follow_up
> on that dentry we will move to / dentry => loop finishes => user get ENOENT.
>
> >
> > I hope I'm misreading you; in case I'm not, the ABI is extremely
> > tasteless and until you manage to document the exact semantics you want
> > for param->path, consider it NAKed.
> >
> > BTW, if that thing would be made to work, what's to stop somebody from
> > doing ...at() syscalls with the resulting fd as a starting point and pathnames
> > starting with ".."? "/foo is overmounted, but we can get to anything under
> > /foo/bar/ in the underlying tree since there's an autofs mount somewhere in
> > /foo/bar/splat/puke/*"?
>
> Interesting point. Thank you!
> I'm not sure. But is it serious problem for us? What stop somebody to open
> and hold fd to any directory before overmounting?
>
> >
> > IOW, the real question (aside of "WTF?") is what are you using the
> > resulting descriptor for and what do you need to be able to do with it.
> > Details, please.
>
> Sure. I've covered use cases of file descriptor returned by ioctl(AUTOFS_DEV_IOCTL_OPENMOUNT)
> above.
>
> Thanks for your reply!
> I'm sorry If my patch description is unclear. I'm newbie here :)
>
> Regards,
> Alex
Gentle ping.
Thanks,
Alex
Powered by blists - more mailing lists