linux-kernel - Re: [PATCH 26/26] x86, pkeys: Documentation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20151003072755.GA23524@gmail.com>
Date:	Sat, 3 Oct 2015 09:27:55 +0200
From:	Ingo Molnar <mingo@...nel.org>
To:	Dave Hansen <dave@...1.net>
Cc:	Andy Lutomirski <luto@...capital.net>,
	Kees Cook <keescook@...gle.com>,
	"x86@...nel.org" <x86@...nel.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Linux-MM <linux-mm@...ck.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Andy Lutomirski <luto@...nel.org>,
	Borislav Petkov <bp@...en8.de>
Subject: Re: [PATCH 26/26] x86, pkeys: Documentation


* Dave Hansen <dave@...1.net> wrote:

> On 10/01/2015 11:23 PM, Ingo Molnar wrote:
> >> > Also, how do we do mprotect_pkey and say "don't change the key"?
> >
> > So if we start managing keys as a resource (i.e. alloc/free up to 16 of them), 
> > and provide APIs for user-space to do all that, then user-space is not 
> > supposed to touch keys it has not allocated for itself - just like it's not 
> > supposed to write to fds it has not opened.
> 
> I like that.  It gives us at least a "soft" indicator to userspace about what 
> keys it should or shouldn't be using.

Yes. A 16-bit allocation bitmap would solve this nicely.

> > Such an allocation method can still 'mess up', and if the kernel allocates a key 
> > for its purposes it should not assume that user-space cannot change it, but at 
> > least for non-buggy code there's no interaction and it would work out fine.
> 
> Yeah.  It also provides a clean interface so that future hardware could
> enforce enforce kernel "ownership" of a key which could protect against
> even buggy code.
> 
> So, we add a pair of syscalls,
> 
> 	unsigned long sys_alloc_pkey(unsigned long flags??)
> 	unsigned long sys_free_pkey(unsigned long pkey)
> 
> keep the metadata in the mm, and then make sure that userspace allocated
> it before it is allowed to do an mprotect_pkey() with it.

Yeah, so such an interface would allow the clean, transparent usage of pkeys for 
pure PROT_EXEC mappings.

I'd expect the --x/PROT_EXEC mappings to be _by far_ more frequently used than 
pure pkeys - but we still need the management interface to keep the kernel's use 
of pkeys separate from user-space's use.

If all the necessary tooling changes are propagated through then in fact I'd 
expect every pkeys capable Linux system to use pkeys, for almost every user-space 
task.

To have maximum future flexibility for pkeys I'd suggest the following additional 
changes to the syscall ABI:

 - Please name them with a pkey_ prefix, along the sys_pkey_* nomenclature, so 
   that it becomes an easily identified 'family' of system calls.

 - I'd also suggest providing an initial value with the 'alloc' call. It's true 
   that user-space can do this itself in assembly, OTOH there's no reason not to 
   provide a C interface for this.

 - Make the pkey identifier 'int', not 'long', like fds are. There's very little
   expectation to ever have more than 4 billion pkeys per mm, right?

 - How far do we want the kernel to manage this? Any reason we don't want a
   'set pkey' operation, if user-space wants to use pure C interfaces? That could 
   be vDSO accelerated as well, to use the unprivileged op. An advantage of such
   an interface would be that it would enable the kernel to more actively manage
   the actual mappings as well in the future: for example to automatically not
   allow accidental RWX mappings. Such an interface would also allow the future
   introduction of privileged pkey mappings on the hardware side, without having
   to change user-space, since everything goes via the kernel interface.

 - Along similar considerations, also add a sys_pkey_query() system call to query 
   the mapping of a specific pkey. (returns -EBADF or so if the key is not mapped
   at the moment.) This too could be vDSO accelerated in the future.

I.e. something like:

     unsigned long sys_pkey_alloc (unsigned long flags, unsigned long init_val)
     unsigned long sys_pkey_set   (int pkey, unsigned long new_val)
     unsigned long sys_pkey_get   (int pkey)
     unsigned long sys_pkey_free  (int pkey)

Optional suggestion:

 - _Maybe_ also allow the 'remote managed' setup of pkeys: of non-local tasks - 
   but I'm not sure about that: it looks expensive and complex, and a TID argument 
   can always be added later if there's some real need.

> That should be pretty easy to implement.  The only real overhead is the 16 bits 
> we need to keep in the mm somewhere.

Yes.

Note that if we use the C syscall interface suggestions I outlined above, we could 
in the future also change to have a full table, and manage it explicitly - without 
user-space changes - if the hardware side is tweaked to allow kernel side pkeys.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/