[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CABi2SkW_CqwNXFu9BUDfTX07aq5T8OAZern9hL=WumEWqK=ZFA@mail.gmail.com>
Date: Fri, 14 Feb 2025 06:39:48 -0800
From: Jeff Xu <jeffxu@...omium.org>
To: "Liam R. Howlett" <Liam.Howlett@...cle.com>, Jeff Xu <jeffxu@...omium.org>,
Kees Cook <kees@...nel.org>, akpm@...ux-foundation.org, jannh@...gle.com,
torvalds@...ux-foundation.org, vbabka@...e.cz, lorenzo.stoakes@...cle.com,
adhemerval.zanella@...aro.org, oleg@...hat.com, avagin@...il.com,
benjamin@...solutions.net, linux-kernel@...r.kernel.org,
linux-hardening@...r.kernel.org, linux-mm@...ck.org, jorgelo@...omium.org,
sroettger@...gle.com, hch@....de, ojeda@...nel.org,
thomas.weissschuh@...utronix.de, adobriyan@...il.com,
johannes@...solutions.net, pedro.falcato@...il.com, hca@...ux.ibm.com,
willy@...radead.org, anna-maria@...utronix.de, mark.rutland@....com,
linus.walleij@...aro.org, Jason@...c4.com, deller@....de,
rdunlap@...radead.org, davem@...emloft.net, peterx@...hat.com,
f.fainelli@...il.com, gerg@...nel.org, dave.hansen@...ux.intel.com,
mingo@...nel.org, ardb@...nel.org, mhocko@...e.com, 42.hyeyoo@...il.com,
peterz@...radead.org, ardb@...gle.com, enh@...gle.com, rientjes@...gle.com,
groeck@...omium.org, mpe@...erman.id.au, aleksandr.mikhalitsyn@...onical.com,
mike.rapoport@...il.com
Subject: Re: [RFC PATCH v5 1/7] mseal, system mappings: kernel config and
header change
On Thu, Feb 13, 2025 at 5:10 PM Liam R. Howlett <Liam.Howlett@...cle.com> wrote:
>
> * Liam R. Howlett <Liam.Howlett@...cle.com> [250213 19:14]:
> > * Jeff Xu <jeffxu@...omium.org> [250213 17:00]:
> > > On Thu, Feb 13, 2025 at 12:54 PM Liam R. Howlett
> > > <Liam.Howlett@...cle.com> wrote:
> > >
> > > > > > >
> > > > > > > VM_SEALED isn't defined in 32-bit systems, and mseal.c isn't part of
> > > > > > > the build. This is intentional. Any 32-bit code trying to use the
> > > > > > > sealing function or the VM_SEALED flag will immediately fail
> > > > > > > compilation. This makes it easier to identify incorrect usage.
> > > > > >
> > > > > > So you are against using the #define because the VM_SYSTEM_SEAL will be
> > > > > > defined, even though it will be VM_NONE? This is no worse than a
> > > > > > function that returns 0, and it aligns better with what we have today in
> > > > > > that it can be put in the list of other flags.
> > > > >
> > > > > When I was reading through all of this and considering the history of
> > > > > its development goals, it strikes me that a function is easier for the
> > > > > future if/when this can be made a boot-time setting.
> > > > >
> > > >
> > > > Reworking this change to function as a boot-time parameter, or whatever,
> > > > would not be a significant amount of work, if/when the time comes.
> > > > Since we don't know what the future holds, it it unnecessary to engineer
> > > > in a potential change for a future version when the change is easy
> > > > enough to make later and keep the code cleaner now.
> > > >
> > > Sure, I will put the function in mm.h for this patch. We can find a
> > > proper place when it is time to implement the boot-time parameter
> > > change.
> > >
> > > The call stack for sealing system mapping is something like below:
> > >
> > > install_special_mapping (mm/mmap.c)
> > > map_vdso (arch/x86/entry/vdso/vma.c)
> > > load_elf_binary (fs/binfmt_elf.c)
> > > load_misc_binary (fs/binfmt_misc.c)
> > > bprm_execve (fs/exec.c)
> > > do_execveat_common
> > > __x64_sys_execve
> > > do_syscall_64
> > >
> > > IMO, there's a clear divide between the API implementer and the API user.
> > > mm and mm.h are the providers, offering the core mm functionality
> > > through APIs/data structures like install_special_mapping().
> > >
> > > The exe layer (bprm_execve, map_vdso, etc) is the consumer of the
> > > install_special_mapping.
> > > The logic related to checking if sealing system mapping is enabled
> > > belongs to the exe layer.
> >
> > Since this is an all or nothing enabling, there is no reason to have
> > each caller check the same thing and do the same action. You should put
> > the logic into the provider - they all end up doing the same thing.
> >
> > Also, this is a compile time option so it doesn't even need to be
> > checked on execution - just apply it in the first place, at the source.
> > Your static inline function was already doing this...?
> >
> > I'm confused as to what you are arguing here because it goes against
> > what you had and what I suggested. The alternative you are suggesting
> > is more code, more instructions, and the best outcome is the same
> > result.
>
> I think I understand what you are saying now: the interface to
> install_special_mapping() needs to take the vma flag, as it does today.
> What I don't understand is that what you proposed and what I proposed
> both do that.
>
> What I'm saying is that, since system mappings are enabled, we can
> already know implicitly by having VM_SYSTEM_SEAL either VM_NONE or
> VM_SEAL.
>
> Turning this:
>
> @@ -264,11 +266,12 @@ static int map_vdso(const struct vdso_image *image, unsigned long addr)
> /*
> * MAYWRITE to allow gdb to COW and set breakpoints
> */
> + vm_flags = VM_READ|VM_EXEC|VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC;
> + vm_flags |= mseal_system_mappings();
> vma = _install_special_mapping(mm,
> text_start,
> image->size,
> - VM_READ|VM_EXEC|
> - VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
> + vm_flags,
> &vdso_mapping);
>
> to this:
>
> /*
> * MAYWRITE to allow gdb to COW and set breakpoints
> */
> vma = _install_special_mapping(mm,
> text_start,
> image->size,
> VM_READ|VM_EXEC|
> - VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
> + VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC|
> + VM_SYSTEM_SEAL,
> &vdso_mapping);
>
> No unsigned long vm_flags needed. It's easier to read and I don't think
> it's any more hidden than the vm_flags |= function call option.
>
The arch code needs a mseal_system_mappings() function. Otherwise,
I'll have to change this line (in arch) again when I implement the kernel
command line or pre-process opt-in/opt-out, which requires a function
call. This isn't overthinking; based on our discussion so far, there
are clear needs for a subsequent patch series. This patch is just the
first step.
For software layering, I'd like to see a clear separation between
layers. mm implements _install_special_mapping, which accepts any
combination of vm_flags as input. Then I'd like the caller (in arch
code) to have all the necessary code to compose the vm_flags in one
place. This helps readability. In the past, we discussed merging the
vdso/vvar code across all architectures. When that happens, the new
code in arch will likely have its own .c and .h files, but it will still sit
above mm. That means mm won't need to change, and the
_install_special_mapping API from mm will remain unchanged.
The mseal_system_mappings() function can already return VM_SEALED,
and future patches will add more logic into mseal_system_mappings(), e.g.
check for kernel cmdline or opt-in/opt-out. we don't need a separate
VM_SYSTEM_SEAL, which is a *** redirect macro ***, I prefer to keep
all the important code logic relevant to configuration of enabling system
mapping sealing in one place, for easy reading.
mseal_system_mappings() can be placed in mm.h in this patch, as you
suggested. But in the near future, it will be moved out of mm.h and find
a right header. The functionality belongs to exe namespace, because of
the reasons I put in the commit message and discussions.
Thanks
-Jeff
-Jeff
Powered by blists - more mailing lists