lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CABi2SkXgZfTvyPX_rcb8KTKyeHxpZrL9_2Wr+vJ1q3K3_1rwoQ@mail.gmail.com>
Date: Wed, 11 Dec 2024 14:46:46 -0800
From: Jeff Xu <jeffxu@...omium.org>
To: Andrei Vagin <avagin@...il.com>
Cc: akpm@...ux-foundation.org, keescook@...omium.org, jannh@...gle.com, 
	torvalds@...ux-foundation.org, adhemerval.zanella@...aro.org, oleg@...hat.com, 
	linux-kernel@...r.kernel.org, linux-hardening@...r.kernel.org, 
	linux-mm@...ck.org, jorgelo@...omium.org, sroettger@...gle.com, 
	ojeda@...nel.org, adobriyan@...il.com, anna-maria@...utronix.de, 
	mark.rutland@....com, linus.walleij@...aro.org, Jason@...c4.com, 
	deller@....de, rdunlap@...radead.org, davem@...emloft.net, hch@....de, 
	peterx@...hat.com, hca@...ux.ibm.com, f.fainelli@...il.com, gerg@...nel.org, 
	dave.hansen@...ux.intel.com, mingo@...nel.org, ardb@...nel.org, 
	Liam.Howlett@...cle.com, mhocko@...e.com, 42.hyeyoo@...il.com, 
	peterz@...radead.org, ardb@...gle.com, enh@...gle.com, rientjes@...gle.com, 
	groeck@...omium.org, mpe@...erman.id.au, 
	Dmitry Safonov <0x7f454c46@...il.com>, Mike Rapoport <mike.rapoport@...il.com>, 
	Alexander Mikhalitsyn <aleksandr.mikhalitsyn@...onical.com>, Andrei Vagin <avagin@...gle.com>
Subject: Re: [PATCH v4 1/1] exec: seal system mappings

Hi Andrei

Thanks for your email.
I was hoping to get some feedback from CRIU devs, and happy to see you
reaching out..

On Mon, Dec 9, 2024 at 8:12 PM Andrei Vagin <avagin@...il.com> wrote:
>
> On Mon, Nov 25, 2024 at 12:49 PM <jeffxu@...omium.org> wrote:
> >
> > From: Jeff Xu <jeffxu@...omium.org>
> >
> > Seal vdso, vvar, sigpage, uprobes and vsyscall.
> >
> > Those mappings are readonly or executable only, sealing can protect
> > them from ever changing or unmapped during the life time of the process.
> > For complete descriptions of memory sealing, please see mseal.rst [1].
> >
> > System mappings such as vdso, vvar, and sigpage (for arm) are
> > generated by the kernel during program initialization, and are
> > sealed after creation.
> >
> > Unlike the aforementioned mappings, the uprobe mapping is not
> > established during program startup. However, its lifetime is the same
> > as the process's lifetime [2]. It is sealed from creation.
> >
> > The vdso, vvar, sigpage, and uprobe mappings all invoke the
> > _install_special_mapping() function. As no other mappings utilize this
> > function, it is logical to incorporate sealing logic within
> > _install_special_mapping(). This approach avoids the necessity of
> > modifying code across various architecture-specific implementations.
> >
> > The vsyscall mapping, which has its own initialization function, is
> > sealed in the XONLY case, it seems to be the most common and secure
> > case of using vsyscall.
> >
> > It is important to note that the CHECKPOINT_RESTORE feature (CRIU) may
> > alter the mapping of vdso, vvar, and sigpage during restore
> > operations. Consequently, this feature cannot be universally enabled
> > across all systems.
> >
> ...
> >
> > +config SEAL_SYSTEM_MAPPINGS
> > +       bool "seal system mappings"
> > +       default n
> > +       depends on 64BIT
> > +       depends on ARCH_HAS_SEAL_SYSTEM_MAPPINGS
> > +       depends on !CHECKPOINT_RESTORE
>
> Hi Jeff,
>
> I like the idea of this patchset, but I don’t like the idea of
> forcing users to choose between this security feature and
> checkpoint/restore functionality. We need to explore ways to make this
> feature work with checkpoint/restore. Relying on CAP_CHECKPOINT_RESTORE
> is the obvious approach.
>
I agree that forcing users to choose isn't ideal. I'd prefer a
solution where both approaches can be used in some way, depending on
the situation and distributions. Hopefully, with input from CRIU
developers, this can be achieved.

However, it makes sense to unconditionally seal vdso/vvar for systems
like ChromeOS and Android that don't currently use CRIU, so we will
need a KCONFIG for that.

> CRIU just needs to move these mappings, and it doesn't need to change
> their properties or modify their contents. With that in mind, here are
That is an important detail to know, thanks for bringing it up.

> two options:
> * Allow moving sealed mappings for processes with CAP_CHECKPOINT_RESTORE.
We could try to propose this under a new KCONFIG, e.g.
CONFIG_SEAL_SYSTEM_MAPPING_WITH_CAP_CHECK

IIUC, You propose allowing userspace mremap vdso if the process has
CAP_CHECKPOINT_RESTORE. However, I believe this approach raises
security concerns. During the RFC  for mseal, initially, I suggested
sealing mmap, mremap, and munmap individually, but Linus rejected this
proposal, for a good reason, mremap could leave an empty hole in the
address space, thus allowing the attacker to fill it with attacker
controlled content.

Furthermore, CAP_SYSTEM_ADMIN allows setting any capacity,  so this
would become a by-pass for sealing.

> * Allow temporarily "unsealing" mappings for processes with
>   CAP_CHECKPOINT_RESTORE. CRIU could unseal mappings, move them, and
>   then seal them back.
>
We could also try to propose this under CONFIG_SEAL_SYSTEM_MAPPING_FOR_CRIU.

It's important to note that temporarily unsealing a mapping from
userspace is not permitted. If a mapping has the capability to be
unsealed, it fundamentally does not provide the sealing property.

Perhaps the intention was for these steps to be carried out within the
kernel? e.g. the userspace could instruct the kernel to relocate the
vdso mapping. Since the kernel can ensure the vdso contents are not
manipulated by an attacker, this approach could offer a viable
solution.

I have been thinking of other alternatives, but those would require
more understanding on CRIU use cases.
One of my questions is: Would CRIU target an individual process? or
entire systems?

If it is an individual process, we could use prctl to opt-in/opt-out
certain processes. There could be two alternatives.
1> Opt-in solution: process must set prctl.seal_criu_mapping, this
needs to be set before execve() because sealing is applied at execve()
call.
2> opt-out solution: The system will by default seal all of the system
mappings, but individual processes can opt-out by setting
prctl.not_seal_criu_mappings. This also needs to be set before
execve() call.

For both cases, we will want to identify what type of mapping CRIU
cares about, i.e. maybe CRIU doesn't care about uprobe and vsyscall ?
and only care about vdso/vvar/sigpage ?

> Another approach might be to make this feature configurable on a
> per-process basis (e.g., via prctl). Once enabled for a process, it
> would be inherited by all its children. It can't be disabled unless a
> process has CAP_CHECKPOINT_RESTORE.
>
> I've added Mike, Dima, and Alex to the thread. They might have
> other ideas.
>
Thanks. Please feel free to chime in, I will also add Mike,Dima and
Alex to the new version of this series as well.

Thanks!
-Jeff



> Thanks,
> Andrei

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ