[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZL+shMg5LJgYlsDd@example.org>
Date: Tue, 25 Jul 2023 13:05:40 +0200
From: Alexey Gladkov <legion@...nel.org>
To: Christian Brauner <brauner@...nel.org>
Cc: Florian Weimer <fweimer@...hat.com>,
LKML <linux-kernel@...r.kernel.org>,
Arnd Bergmann <arnd@...db.de>, linux-api@...r.kernel.org,
linux-fsdevel@...r.kernel.org, viro@...iv.linux.org.uk,
James.Bottomley@...senpartnership.com, acme@...nel.org,
alexander.shishkin@...ux.intel.com, axboe@...nel.dk,
benh@...nel.crashing.org, borntraeger@...ibm.com, bp@...en8.de,
catalin.marinas@....com, christian@...uner.io, dalias@...c.org,
davem@...emloft.net, deepa.kernel@...il.com, deller@....de,
dhowells@...hat.com, fenghua.yu@...el.com, geert@...ux-m68k.org,
glebfm@...linux.org, gor@...ux.ibm.com, hare@...e.com,
hpa@...or.com, ink@...assic.park.msu.ru, jhogan@...nel.org,
kim.phillips@....com, ldv@...linux.org,
linux-alpha@...r.kernel.org, linux-arch@...r.kernel.org,
linux-arm-kernel@...ts.infradead.org, linux-ia64@...r.kernel.org,
linux-m68k@...ts.linux-m68k.org, linux-mips@...r.kernel.org,
linux-parisc@...r.kernel.org, linux-s390@...r.kernel.org,
linux-sh@...r.kernel.org, linux@...linux.org.uk,
linuxppc-dev@...ts.ozlabs.org, luto@...nel.org, mattst88@...il.com,
mingo@...hat.com, monstr@...str.eu, mpe@...erman.id.au,
namhyung@...nel.org, paul.burton@...s.com, paulus@...ba.org,
peterz@...radead.org, ralf@...ux-mips.org, rth@...ddle.net,
sparclinux@...r.kernel.org, stefan@...er.ch, tglx@...utronix.de,
tony.luck@...el.com, tycho@...ho.ws, will@...nel.org,
x86@...nel.org, ysato@...rs.sourceforge.jp
Subject: Re: [PATCH v3 0/5] Add a new fchmodat4() syscall
On Tue, Jul 11, 2023 at 05:14:24PM +0200, Christian Brauner wrote:
> On Tue, Jul 11, 2023 at 02:24:51PM +0200, Florian Weimer wrote:
> > * Alexey Gladkov:
> >
> > > This patch set adds fchmodat4(), a new syscall. The actual
> > > implementation is super simple: essentially it's just the same as
> > > fchmodat(), but LOOKUP_FOLLOW is conditionally set based on the flags.
> > > I've attempted to make this match "man 2 fchmodat" as closely as
> > > possible, which says EINVAL is returned for invalid flags (as opposed to
> > > ENOTSUPP, which is currently returned by glibc for AT_SYMLINK_NOFOLLOW).
> > > I have a sketch of a glibc patch that I haven't even compiled yet, but
> > > seems fairly straight-forward:
> > >
> > > diff --git a/sysdeps/unix/sysv/linux/fchmodat.c b/sysdeps/unix/sysv/linux/fchmodat.c
> > > index 6d9cbc1ce9e0..b1beab76d56c 100644
> > > --- a/sysdeps/unix/sysv/linux/fchmodat.c
> > > +++ b/sysdeps/unix/sysv/linux/fchmodat.c
> > > @@ -29,12 +29,36 @@
> > > int
> > > fchmodat (int fd, const char *file, mode_t mode, int flag)
> > > {
> > > - if (flag & ~AT_SYMLINK_NOFOLLOW)
> > > - return INLINE_SYSCALL_ERROR_RETURN_VALUE (EINVAL);
> > > -#ifndef __NR_lchmod /* Linux so far has no lchmod syscall. */
> > > + /* There are four paths through this code:
> > > + - The flags are zero. In this case it's fine to call fchmodat.
> > > + - The flags are non-zero and glibc doesn't have access to
> > > + __NR_fchmodat4. In this case all we can do is emulate the error codes
> > > + defined by the glibc interface from userspace.
> > > + - The flags are non-zero, glibc has __NR_fchmodat4, and the kernel has
> > > + fchmodat4. This is the simplest case, as the fchmodat4 syscall exactly
> > > + matches glibc's library interface so it can be called directly.
> > > + - The flags are non-zero, glibc has __NR_fchmodat4, but the kernel does
> >
> > If you define __NR_fchmodat4 on all architectures, we can use these
> > constants directly in glibc. We no longer depend on the UAPI
> > definitions of those constants, to cut down the number of code variants,
> > and to make glibc's system call profile independent of the kernel header
> > version at build time.
> >
> > Your version is based on 2.31, more recent versions have some reasonable
> > emulation for fchmodat based on /proc/self/fd. I even wrote a comment
> > describing the same buggy behavior that you witnessed:
> >
> > + /* Some Linux versions with some file systems can actually
> > + change symbolic link permissions via /proc, but this is not
> > + intentional, and it gives inconsistent results (e.g., error
> > + return despite mode change). The expected behavior is that
> > + symbolic link modes cannot be changed at all, and this check
> > + enforces that. */
> > + if (S_ISLNK (st.st_mode))
> > + {
> > + __close_nocancel (pathfd);
> > + __set_errno (EOPNOTSUPP);
> > + return -1;
> > + }
> >
> > I think there was some kernel discussion about that behavior before, but
> > apparently, it hasn't led to fixes.
>
> I think I've explained this somewhere else a couple of months ago but
> just in case you weren't on that thread or don't remember and apologies
> if you should already know.
>
> A lot of filesystem will happily update the mode of a symlink. The VFS
> doesn't do anything to prevent this from happening. This is filesystem
> specific.
>
> The EOPNOTSUPP you're seeing very likely comes from POSIX ACLs.
> Specifically it comes from filesystems that call posix_acl_chmod(),
> e.g., btrfs via
>
> if (!err && attr->ia_valid & ATTR_MODE)
> err = posix_acl_chmod(idmap, dentry, inode->i_mode);
>
> Most filesystems don't implement i_op->set_acl() for POSIX ACLs.
> So posix_acl_chmod() will report EOPNOTSUPP. By the time
> posix_acl_chmod() is called, most filesystems will have finished
> updating the inode. POSIX ACLs also often aren't integrated into
> transactions so a rollback wouldn't even be possible on some
> filesystems.
>
> Any filesystem that doesn't implement POSIX ACLs at all will obviously
> never fail unless it blocks mode changes on symlinks. Or filesystems
> that do have a way to rollback failures from posix_acl_chmod(), or
> filesystems that do return an error on chmod() on symlinks such as 9p,
> ntfs, ocfs2.
>
> >
> > I wonder if it makes sense to add a similar error return to the system
> > call implementation?
>
> Hm, blocking symlink mode changes is pretty regression prone. And just
> blocking it through one interface seems weird and makes things even more
> inconsistent.
>
> So two options I see:
> (1) minimally invasive:
> Filesystems that do call posix_acl_chmod() on symlinks need to be
> changed to stop doing that.
> (2) might hit us on the head invasive:
> Try and block symlink mode changes in chmod_common().
>
> Thoughts?
>
We have third option. We can choose not to call chmod_common and return an
error right away:
diff --git a/fs/open.c b/fs/open.c
index 39a7939f0d00..86a427a2a083 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -679,7 +679,9 @@ static int do_fchmodat(int dfd, const char __user *filename, umode_t mode, int l
retry:
error = user_path_at(dfd, filename, lookup_flags, &path);
if (!error) {
- error = chmod_common(&path, mode);
+ error = -EOPNOTSUPP;
+ if (!(flags & AT_SYMLINK_NOFOLLOW) || !S_ISLNK(path.dentry->d_inode->i_mode))
+ error = chmod_common(&path, mode);
path_put(&path);
if (retry_estale(error, lookup_flags)) {
lookup_flags |= LOOKUP_REVAL;
It doesn't seem to be invasive.
--
Rgrds, legion
Powered by blists - more mailing lists