linux-kernel - Re: [LSF/MM TOPIC] VM containers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <56A65AA2.6040307@redhat.com>
Date:	Mon, 25 Jan 2016 12:25:54 -0500
From:	Rik van Riel <riel@...hat.com>
To:	One Thousand Gnomes <gnomes@...rguk.ukuu.org.uk>,
	"Nakajima, Jun" <jun.nakajima@...el.com>
Cc:	"lsf-pc@...ts.linuxfoundation.org" <lsf-pc@...ts.linuxfoundation.org>,
	Linux Memory Management List <linux-mm@...ck.org>,
	Linux kernel Mailing List <linux-kernel@...r.kernel.org>,
	KVM list <kvm@...r.kernel.org>
Subject: Re: [LSF/MM TOPIC] VM containers

On 01/24/2016 12:06 PM, One Thousand Gnomes wrote:
>>> That changes some of the goals the memory management subsystem has,
>>> from "use all the resources effectively" to "use as few resources as
>>> necessary, in case the host needs the memory for something else".
> 
> Also "and take guidance/provide telemetry" - because you want to tune the
> VM behaviours based upon policy and to learn from them for when you re-run
> that container.
> 
>> Beyond memory consumption, I would be interested whether we can harden the kernel by the paravirt interfaces for memory protection in VMs (if any). For example, the hypervisor could write-protect part of the page tables or kernel data structures in VMs, and does it help?
> 
> There are four behaviours I can think of, some of which you see in
> various hypervisors and security hardening systems
> 
> - die on write (a write here causes a security trap and termination after
>   the guest has marked the page range die on write, and it cannot be
>   unmarked). The guest OS at boot can for example mark all it's code as
>   die-on-write.
> - irrevocably read only (VM never allows page to be rewritten by guest
>   after the guest marks the page range irrevocably r/o)

For these we get the question "how do we make it harder for the
guest to remap the page tables to point at read/write memory,
and modify that instead of the read-only memory?"

On "smaller" guests (less than 1TB in size), it may be enough to
ensure that the kernel PUD pointer points to the (read-only) kernel
PUD at context switch time, placing the main kernel page tables,
kernel text, and some other things in read-only memory.

> - asynchronous faulting (pages the guest thinks are in it's memory but
>   are in fact on the hosts swap cause a subscribable fault in the guest
>   so that it can (where possible) be context switched

KVM (and s390) already do the asynchronous page fault trick.

> - free if needed - marking pages as freed up and either you get a page
>   back as it was or a fault and a zeroed page

People have worked on this for KVM. I do not remember what
happened to the code.

-- 
All rights reversed