linux-kernel - Re: [PATCH v20 1/6] exec: Add a new AT

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241014.waet2se6Jeiw@digikod.net>
Date: Mon, 14 Oct 2024 09:39:30 +0200
From: Mickaël Salaün <mic@...ikod.net>
To: Amir Goldstein <amir73il@...il.com>
Cc: Al Viro <viro@...iv.linux.org.uk>, 
	Christian Brauner <brauner@...nel.org>, Kees Cook <keescook@...omium.org>, 
	Linus Torvalds <torvalds@...ux-foundation.org>, Paul Moore <paul@...l-moore.com>, 
	Serge Hallyn <serge@...lyn.com>, Theodore Ts'o <tytso@....edu>, 
	Adhemerval Zanella Netto <adhemerval.zanella@...aro.org>, Alejandro Colomar <alx@...nel.org>, 
	Aleksa Sarai <cyphar@...har.com>, Andrew Morton <akpm@...ux-foundation.org>, 
	Andy Lutomirski <luto@...nel.org>, Arnd Bergmann <arnd@...db.de>, 
	Casey Schaufler <casey@...aufler-ca.com>, Christian Heimes <christian@...hon.org>, 
	Dmitry Vyukov <dvyukov@...gle.com>, Elliott Hughes <enh@...gle.com>, 
	Eric Biggers <ebiggers@...nel.org>, Eric Chiang <ericchiang@...gle.com>, 
	Fan Wu <wufan@...ux.microsoft.com>, Florian Weimer <fweimer@...hat.com>, 
	Geert Uytterhoeven <geert@...ux-m68k.org>, James Morris <jamorris@...ux.microsoft.com>, 
	Jan Kara <jack@...e.cz>, Jann Horn <jannh@...gle.com>, Jeff Xu <jeffxu@...gle.com>, 
	Jonathan Corbet <corbet@....net>, Jordan R Abrahams <ajordanr@...gle.com>, 
	Lakshmi Ramasubramanian <nramas@...ux.microsoft.com>, Luca Boccassi <bluca@...ian.org>, 
	Luis Chamberlain <mcgrof@...nel.org>, "Madhavan T . Venkataraman" <madvenka@...ux.microsoft.com>, 
	Matt Bobrowski <mattbobrowski@...gle.com>, Matthew Garrett <mjg59@...f.ucam.org>, 
	Matthew Wilcox <willy@...radead.org>, Miklos Szeredi <mszeredi@...hat.com>, 
	Mimi Zohar <zohar@...ux.ibm.com>, Nicolas Bouchinet <nicolas.bouchinet@....gouv.fr>, 
	Scott Shell <scottsh@...rosoft.com>, Shuah Khan <shuah@...nel.org>, 
	Stephen Rothwell <sfr@...b.auug.org.au>, Steve Dower <steve.dower@...hon.org>, 
	Steve Grubb <sgrubb@...hat.com>, Thibaut Sautereau <thibaut.sautereau@....gouv.fr>, 
	Vincent Strubel <vincent.strubel@....gouv.fr>, Xiaoming Ni <nixiaoming@...wei.com>, 
	Yin Fengwei <fengwei.yin@...el.com>, kernel-hardening@...ts.openwall.com, linux-api@...r.kernel.org, 
	linux-fsdevel@...r.kernel.org, linux-integrity@...r.kernel.org, linux-kernel@...r.kernel.org, 
	linux-security-module@...r.kernel.org
Subject: Re: [PATCH v20 1/6] exec: Add a new AT_CHECK flag to execveat(2)

On Sun, Oct 13, 2024 at 11:25:11AM +0200, Amir Goldstein wrote:
> On Fri, Oct 11, 2024 at 8:45 PM Mickaël Salaün <mic@...ikod.net> wrote:
> >
> > Add a new AT_CHECK flag to execveat(2) to check if a file would be
> > allowed for execution.  The main use case is for script interpreters and
> > dynamic linkers to check execution permission according to the kernel's
> > security policy. Another use case is to add context to access logs e.g.,
> > which script (instead of interpreter) accessed a file.  As any
> > executable code, scripts could also use this check [1].
> >
> > This is different from faccessat(2) + X_OK which only checks a subset of
> > access rights (i.e. inode permission and mount options for regular
> > files), but not the full context (e.g. all LSM access checks).  The main
> > use case for access(2) is for SUID processes to (partially) check access
> > on behalf of their caller.  The main use case for execveat(2) + AT_CHECK
> > is to check if a script execution would be allowed, according to all the
> > different restrictions in place.  Because the use of AT_CHECK follows
> > the exact kernel semantic as for a real execution, user space gets the
> > same error codes.
> >
> > An interesting point of using execveat(2) instead of openat2(2) is that
> > it decouples the check from the enforcement.  Indeed, the security check
> > can be logged (e.g. with audit) without blocking an execution
> > environment not yet ready to enforce a strict security policy.
> >
> > LSMs can control or log execution requests with
> > security_bprm_creds_for_exec().  However, to enforce a consistent and
> > complete access control (e.g. on binary's dependencies) LSMs should
> > restrict file executability, or mesure executed files, with
> > security_file_open() by checking file->f_flags & __FMODE_EXEC.
> >
> > Because AT_CHECK is dedicated to user space interpreters, it doesn't
> > make sense for the kernel to parse the checked files, look for
> > interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC
> > if the format is unknown.  Because of that, security_bprm_check() is
> > never called when AT_CHECK is used.
> >
> > It should be noted that script interpreters cannot directly use
> > execveat(2) (without this new AT_CHECK flag) because this could lead to
> > unexpected behaviors e.g., `python script.sh` could lead to Bash being
> > executed to interpret the script.  Unlike the kernel, script
> > interpreters may just interpret the shebang as a simple comment, which
> > should not change for backward compatibility reasons.
> >
> > Because scripts or libraries files might not currently have the
> > executable permission set, or because we might want specific users to be
> > allowed to run arbitrary scripts, the following patch provides a dynamic
> > configuration mechanism with the SECBIT_EXEC_RESTRICT_FILE and
> > SECBIT_EXEC_DENY_INTERACTIVE securebits.
> >
> > This is a redesign of the CLIP OS 4's O_MAYEXEC:
> > https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch
> > This patch has been used for more than a decade with customized script
> > interpreters.  Some examples can be found here:
> > https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC
> >
> > Cc: Al Viro <viro@...iv.linux.org.uk>
> > Cc: Christian Brauner <brauner@...nel.org>
> > Cc: Kees Cook <keescook@...omium.org>
> > Cc: Paul Moore <paul@...l-moore.com>
> > Cc: Serge Hallyn <serge@...lyn.com>
> > Link: https://docs.python.org/3/library/io.html#io.open_code [1]
> > Signed-off-by: Mickaël Salaün <mic@...ikod.net>
> > Link: https://lore.kernel.org/r/20241011184422.977903-2-mic@digikod.net
> > ---
> >
> > Changes since v19:
> > * Remove mention of "role transition" as suggested by Andy.
> > * Highlight the difference between security_bprm_creds_for_exec() and
> >   the __FMODE_EXEC check for LSMs (in commit message and LSM's hooks) as
> >   discussed with Jeff.
> > * Improve documentation both in UAPI comments and kernel comments
> >   (requested by Kees).
> >
> > New design since v18:
> > https://lore.kernel.org/r/20220104155024.48023-3-mic@digikod.net
> > ---
> >  fs/exec.c                  | 18 ++++++++++++++++--
> >  include/linux/binfmts.h    |  7 ++++++-
> >  include/uapi/linux/fcntl.h | 31 +++++++++++++++++++++++++++++++
> >  kernel/audit.h             |  1 +
> >  kernel/auditsc.c           |  1 +
> >  security/security.c        | 10 ++++++++++
> >  6 files changed, 65 insertions(+), 3 deletions(-)
> >
> > diff --git a/fs/exec.c b/fs/exec.c
> > index 6c53920795c2..163c659d9ae6 100644
> > --- a/fs/exec.c
> > +++ b/fs/exec.c
> > @@ -891,7 +891,7 @@ static struct file *do_open_execat(int fd, struct filename *name, int flags)
> >                 .lookup_flags = LOOKUP_FOLLOW,
> >         };
> >
> > -       if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0)
> > +       if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH | AT_CHECK)) != 0)
> >                 return ERR_PTR(-EINVAL);
> >         if (flags & AT_SYMLINK_NOFOLLOW)
> >                 open_exec_flags.lookup_flags &= ~LOOKUP_FOLLOW;
> > @@ -1545,6 +1545,20 @@ static struct linux_binprm *alloc_bprm(int fd, struct filename *filename, int fl
> >         }
> >         bprm->interp = bprm->filename;
> >
> > +       /*
> > +        * At this point, security_file_open() has already been called (with
> > +        * __FMODE_EXEC) and access control checks for AT_CHECK will stop just
> > +        * after the security_bprm_creds_for_exec() call in bprm_execve().
> > +        * Indeed, the kernel should not try to parse the content of the file
> > +        * with exec_binprm() nor change the calling thread, which means that
> > +        * the following security functions will be not called:
> > +        * - security_bprm_check()
> > +        * - security_bprm_creds_from_file()
> > +        * - security_bprm_committing_creds()
> > +        * - security_bprm_committed_creds()
> > +        */
> > +       bprm->is_check = !!(flags & AT_CHECK);
> > +
> >         retval = bprm_mm_init(bprm);
> >         if (!retval)
> >                 return bprm;
> > @@ -1839,7 +1853,7 @@ static int bprm_execve(struct linux_binprm *bprm)
> >
> >         /* Set the unchanging part of bprm->cred */
> >         retval = security_bprm_creds_for_exec(bprm);
> > -       if (retval)
> > +       if (retval || bprm->is_check)
> >                 goto out;
> >
> >         retval = exec_binprm(bprm);
> > diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
> > index e6c00e860951..8ff0eb3644a1 100644
> > --- a/include/linux/binfmts.h
> > +++ b/include/linux/binfmts.h
> > @@ -42,7 +42,12 @@ struct linux_binprm {
> >                  * Set when errors can no longer be returned to the
> >                  * original userspace.
> >                  */
> > -               point_of_no_return:1;
> > +               point_of_no_return:1,
> > +               /*
> > +                * Set by user space to check executability according to the
> > +                * caller's environment.
> > +                */
> > +               is_check:1;
> >         struct file *executable; /* Executable to pass to the interpreter */
> >         struct file *interpreter;
> >         struct file *file;
> > diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
> > index 87e2dec79fea..e606815b1c5a 100644
> > --- a/include/uapi/linux/fcntl.h
> > +++ b/include/uapi/linux/fcntl.h
> > @@ -154,6 +154,37 @@
> >                                            usable with open_by_handle_at(2). */
> >  #define AT_HANDLE_MNT_ID_UNIQUE        0x001   /* Return the u64 unique mount ID. */
> >
> > +/*
> > + * AT_CHECK only performs a check on a regular file and returns 0 if execution
> > + * of this file would be allowed, ignoring the file format and then the related
> > + * interpreter dependencies (e.g. ELF libraries, script's shebang).
> > + *
> > + * Programs should always perform this check to apply kernel-level checks
> > + * against files that are not directly executed by the kernel but passed to a
> > + * user space interpreter instead.  All files that contain executable code,
> > + * from the point of view of the interpreter, should be checked.  However the
> > + * result of this check should only be enforced according to
> > + * SECBIT_EXEC_RESTRICT_FILE or SECBIT_EXEC_DENY_INTERACTIVE.  See securebits.h
> > + * documentation and the samples/check-exec/inc.c example.
> > + *
> > + * The main purpose of this flag is to improve the security and consistency of
> > + * an execution environment to ensure that direct file execution (e.g.
> > + * `./script.sh`) and indirect file execution (e.g. `sh script.sh`) lead to the
> > + * same result.  For instance, this can be used to check if a file is
> > + * trustworthy according to the caller's environment.
> > + *
> > + * In a secure environment, libraries and any executable dependencies should
> > + * also be checked.  For instance, dynamic linking should make sure that all
> > + * libraries are allowed for execution to avoid trivial bypass (e.g. using
> > + * LD_PRELOAD).  For such secure execution environment to make sense, only
> > + * trusted code should be executable, which also requires integrity guarantees.
> > + *
> > + * To avoid race conditions leading to time-of-check to time-of-use issues,
> > + * AT_CHECK should be used with AT_EMPTY_PATH to check against a file
> > + * descriptor instead of a path.
> > + */
> 
> If you ask me, the very elaborate comment above belongs to execveat(2)
> man page and is way too verbose for a uapi header.

OK, but since this new flags raised a lot of questions, I guess a
dedicated Documentation/userspace-api/check-exec.rst file with thit
AT*_CHECK and the related securebits would be useful instead of the
related inlined documentation.

> 
> > +#define AT_CHECK               0x10000
> 
> Please see the comment "Per-syscall flags for the *at(2) family of syscalls."
> above. If this is a per-syscall flag please use one of the per-syscall
> flags, e.g.:
> 
> /* Flags for execveat2(2) */
> #define AT_EXECVE_CHECK     0x0001   /* Only perform a check if
> execution would be allowed */

I missed this part, this prefix makes sense, thanks.

> 
> 
> Thanks,
> Amir.