[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CABi2SkUfXrRLg7+ZrLjMEQzh4FtU0EAfckdiDuza1mFKzi0SAg@mail.gmail.com>
Date: Wed, 16 Oct 2024 20:43:02 -0700
From: Jeff Xu <jeffxu@...omium.org>
To: "Liam R. Howlett" <Liam.Howlett@...cle.com>, jeffxu@...omium.org, akpm@...ux-foundation.org,
keescook@...omium.org, jannh@...gle.com, torvalds@...ux-foundation.org,
adhemerval.zanella@...aro.org, oleg@...hat.com, linux-kernel@...r.kernel.org,
linux-hardening@...r.kernel.org, linux-mm@...ck.org, jorgelo@...omium.org,
sroettger@...gle.com, ojeda@...nel.org, adobriyan@...il.com,
anna-maria@...utronix.de, mark.rutland@....com, linus.walleij@...aro.org,
Jason@...c4.com, deller@....de, rdunlap@...radead.org, davem@...emloft.net,
hch@....de, peterx@...hat.com, hca@...ux.ibm.com, f.fainelli@...il.com,
gerg@...nel.org, dave.hansen@...ux.intel.com, mingo@...nel.org,
ardb@...nel.org, mhocko@...e.com, 42.hyeyoo@...il.com, peterz@...radead.org,
ardb@...gle.com, enh@...gle.com, rientjes@...gle.com, groeck@...omium.org,
lorenzo.stoakes@...cle.com
Subject: Re: [RFC PATCH v2 1/1] exec: seal system mappings
On Wed, Oct 16, 2024 at 6:10 PM Liam R. Howlett <Liam.Howlett@...cle.com> wrote:
>
> > + exec.seal_system_mappings = [KNL]
> > + Format: { never | always }
> > + Seal system mappings: vdso, vvar, sigpage, uprobes,
> > + vsyscall.
> > + This overwrites KCONFIG CONFIG_SEAL_SYSTEM_MAPPINGS_*
> > + - 'never': never seal system mappings.
>
> Not true, uprobes are sealed when 'never' is selected.
>
Thanks. I forgot to uprobes from the description in Kconfig and
kernel-parameters.txt, will update.
> > + - 'always': always seal system mappings.
> > + If not specified or invalid, default is the KCONFIG value.
> > + This option has no effect if CONFIG_64BIT=n
> > +
> > early_page_ext [KNL,EARLY] Enforces page_ext initialization to earlier
> > stages so cover more early boot allocations.
> > Please note that as side effect some optimizations
> > diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c
> > index 2fb7d53cf333..20a3000550d2 100644
> > --- a/arch/x86/entry/vsyscall/vsyscall_64.c
> > +++ b/arch/x86/entry/vsyscall/vsyscall_64.c
> > @@ -32,6 +32,7 @@
> > #include <linux/mm_types.h>
> > #include <linux/syscalls.h>
> > #include <linux/ratelimit.h>
> > +#include <linux/fs.h>
> >
> > #include <asm/vsyscall.h>
> > #include <asm/unistd.h>
> > @@ -366,8 +367,12 @@ void __init map_vsyscall(void)
> > set_vsyscall_pgtable_user_bits(swapper_pg_dir);
> > }
> >
> > - if (vsyscall_mode == XONLY)
> > - vm_flags_init(&gate_vma, VM_EXEC);
> > + if (vsyscall_mode == XONLY) {
> > + unsigned long vm_flags = VM_EXEC;
> > +
> > + update_seal_exec_system_mappings(&vm_flags);
> > + vm_flags_init(&gate_vma, vm_flags);
> > + }
> >
> > BUILD_BUG_ON((unsigned long)__fix_to_virt(VSYSCALL_PAGE) !=
> > (unsigned long)VSYSCALL_ADDR);
> > diff --git a/fs/exec.c b/fs/exec.c
> > index 77364806b48d..5030879cda47 100644
> > --- a/fs/exec.c
> > +++ b/fs/exec.c
>
> Does it make sense for this to live in exec? Couldn't you put it in the
> mm/mseal.c file? It's vma flags for mappings and you've put it in
> fs/exec?
>
If you are referring to utilities related to kernel cmdline, they
should be in this file.
> > @@ -68,6 +68,7 @@
> > #include <linux/user_events.h>
> > #include <linux/rseq.h>
> > #include <linux/ksm.h>
> > +#include <linux/fs_parser.h>
> >
> > #include <linux/uaccess.h>
> > #include <asm/mmu_context.h>
> > @@ -2159,3 +2160,55 @@ fs_initcall(init_fs_exec_sysctls);
> > #ifdef CONFIG_EXEC_KUNIT_TEST
> > #include "tests/exec_kunit.c"
> > #endif
> > +
> > +#ifdef CONFIG_64BIT
> > +/*
> > + * Kernel cmdline overwrite for CONFIG_SEAL_SYSTEM_MAPPINGS_X
> > + */
> > +enum seal_system_mappings_type {
> > + SEAL_SYSTEM_MAPPINGS_NEVER,
> > + SEAL_SYSTEM_MAPPINGS_ALWAYS
> > +};
> > +
> > +static enum seal_system_mappings_type seal_system_mappings __ro_after_init =
> > + IS_ENABLED(CONFIG_SEAL_SYSTEM_MAPPINGS_ALWAYS) ? SEAL_SYSTEM_MAPPINGS_ALWAYS :
> > + SEAL_SYSTEM_MAPPINGS_NEVER;
> > +
> > +static const struct constant_table value_table_sys_mapping[] __initconst = {
> > + { "never", SEAL_SYSTEM_MAPPINGS_NEVER},
> > + { "always", SEAL_SYSTEM_MAPPINGS_ALWAYS},
> > + { }
> > +};
> > +
> > +static int __init early_seal_system_mappings_override(char *buf)
> > +{
> > + if (!buf)
> > + return -EINVAL;
> > +
> > + seal_system_mappings = lookup_constant(value_table_sys_mapping,
> > + buf, seal_system_mappings);
> > +
> > + return 0;
> > +}
> > +
> > +early_param("exec.seal_system_mappings", early_seal_system_mappings_override);
> > +
> > +static bool seal_system_mappings_enabled(void)
> > +{
> > + if (seal_system_mappings == SEAL_SYSTEM_MAPPINGS_ALWAYS)
> > + return true;
> > +
> > + return false;
> > +}
>
> This function seems unnecessary, it is called from another 3-4 line
> function only.
>
It is more readable this way.
> > +
> > +void update_seal_exec_system_mappings(unsigned long *vm_flags)
> > +{
> > + if (!(*vm_flags & VM_SEALED) && seal_system_mappings_enabled())
>
> Why !(*vm_flags & VM_SEALED) here?
>
If vm_flags is already sealed, then there is no need to check
seal_system_mappings_enabled.
> > + *vm_flags |= VM_SEALED;
> > +
> > +}
>
> Instead of passing a pointer around and checking enabled, why don't you
> have a function that just returns the VM_SEALED or 0 and just or it into
> the flags? This seems very heavy for what it does, why did you do it
> this way?
>
Why is that heavy ? passing a pointer for updating variables is natural.
> The name is also very long and a bit odd, it could be used for other
> reasons, but you have _system_mappings on the end, and you use seal but
> it's mseal (or vm_seal)? Would mseal_flag() work?
>
It could be longer :-)
it means update_sealing_flag_for_executable_system_mappings.
mseal_flag is too short and not descriptive.
> > +#else
> > +void update_seal_exec_system_mappings(unsigned long *vm_flags)
> > +{
> > +}
> > +#endif /* CONFIG_64BIT */
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index 42444ec95c9b..6e44aca4b24b 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
>
> Again, I don't understand why fs.h is the place for mseal definitions?
>
include/linux/fs.h contains other exec related function signatures. So
it is better to keep them at the same header.
> > @@ -3079,6 +3079,7 @@ ssize_t __kernel_read(struct file *file, void *buf, size_t count, loff_t *pos);
> > extern ssize_t kernel_write(struct file *, const void *, size_t, loff_t *);
> > extern ssize_t __kernel_write(struct file *, const void *, size_t, loff_t *);
> > extern struct file * open_exec(const char *);
> > +extern void update_seal_exec_system_mappings(unsigned long *vm_flags);
>
> We are dropping extern where possible now.
>
extern can be dropped, it appears not causing link error.
> >
> > /* fs/dcache.c -- generic fs support functions */
> > extern bool is_subdir(struct dentry *, struct dentry *);
> > diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> > index c47a0bf25e58..e9876fae8887 100644
> > --- a/kernel/events/uprobes.c
> > +++ b/kernel/events/uprobes.c
> > @@ -1506,7 +1506,7 @@ static int xol_add_vma(struct mm_struct *mm, struct xol_area *area)
> > }
> >
> > vma = _install_special_mapping(mm, area->vaddr, PAGE_SIZE,
> > - VM_EXEC|VM_MAYEXEC|VM_DONTCOPY|VM_IO,
> > + VM_EXEC|VM_MAYEXEC|VM_DONTCOPY|VM_IO|VM_SEALED,
> > &xol_mapping);
>
> Changing all uprobes seems like something that should probably be
> mentioned more than just the note at the end of the change log, even if
> you think it won't have any impact. The note is even hidden at the end
> of a paragraph.
>
> I would go as far as splitting this patch out as its own so that the
> subject line specifies that all uprobes will be VM_SEALED now.
>
> Maybe it's fine but maybe it isn't and you've buried it so that it will
> be missed by virtually everyone.
>
I will add "It is sealed from creation." in the above "uprobe" section.
>
> > if (IS_ERR(vma)) {
> > ret = PTR_ERR(vma);
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index 57fd5ab2abe7..d4717e34a60d 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -2133,6 +2133,7 @@ struct vm_area_struct *_install_special_mapping(
> > unsigned long addr, unsigned long len,
> > unsigned long vm_flags, const struct vm_special_mapping *spec)
> > {
> > + update_seal_exec_system_mappings(&vm_flags);
> > return __install_special_mapping(mm, addr, len, vm_flags, (void *)spec,
> > &special_mapping_vmops);
>
> If you were to return a flag, you could change the vm_flags argument to
> vm_flags | mseal_flag()
>
passing pointer seems to be the most efficient way.
Thanks
-Jeff
Powered by blists - more mailing lists