lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sun, 12 Nov 2023 21:23:25 -0500
From: Mickaël Salaün <mic@...ikod.net>
To: Borislav Petkov <bp@...en8.de>,
	Dave Hansen <dave.hansen@...ux.intel.com>,
	"H . Peter Anvin" <hpa@...or.com>,
	Ingo Molnar <mingo@...hat.com>,
	Kees Cook <keescook@...omium.org>,
	Paolo Bonzini <pbonzini@...hat.com>,
	Sean Christopherson <seanjc@...gle.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Vitaly Kuznetsov <vkuznets@...hat.com>,
	Wanpeng Li <wanpengli@...cent.com>
Cc: Mickaël Salaün <mic@...ikod.net>,
	Alexander Graf <graf@...zon.com>,
	Chao Peng <chao.p.peng@...ux.intel.com>,
	"Edgecombe, Rick P" <rick.p.edgecombe@...el.com>,
	Forrest Yuan Yu <yuanyu@...gle.com>,
	James Gowans <jgowans@...zon.com>,
	James Morris <jamorris@...ux.microsoft.com>,
	John Andersen <john.s.andersen@...el.com>,
	"Madhavan T . Venkataraman" <madvenka@...ux.microsoft.com>,
	Marian Rotariu <marian.c.rotariu@...il.com>,
	Mihai Donțu <mdontu@...defender.com>,
	Nicușor Cîțu <nicu.citu@...oud.com>,
	Thara Gopinath <tgopinath@...rosoft.com>,
	Trilok Soni <quic_tsoni@...cinc.com>,
	Wei Liu <wei.liu@...nel.org>,
	Will Deacon <will@...nel.org>,
	Yu Zhang <yu.c.zhang@...ux.intel.com>,
	Zahra Tarkhani <ztarkhani@...rosoft.com>,
	Ștefan Șicleru <ssicleru@...defender.com>,
	dev@...ts.cloudhypervisor.org,
	kvm@...r.kernel.org,
	linux-hardening@...r.kernel.org,
	linux-hyperv@...r.kernel.org,
	linux-kernel@...r.kernel.org,
	linux-security-module@...r.kernel.org,
	qemu-devel@...gnu.org,
	virtualization@...ts.linux-foundation.org,
	x86@...nel.org,
	xen-devel@...ts.xenproject.org
Subject: [RFC PATCH v2 18/19] heki: x86: Protect guest kernel memory using the KVM hypervisor

From: Madhavan T. Venkataraman <madvenka@...ux.microsoft.com>

Implement a hypervisor function, kvm_protect_memory() that calls the
KVM_HC_PROTECT_MEMORY hypercall to request the KVM hypervisor to
set specified permissions on a list of guest pages.

Using the protect_memory() function, set proper EPT permissions for all
guest pages.

Use the MEM_ATTR_IMMUTABLE property to protect the kernel static
sections and the boot-time read-only sections. This enables to make sure
a compromised guest will not be able to change its main physical memory
page permissions. However, this also disable any feature that may change
the kernel's text section (e.g., ftrace, Kprobes), but they can still be
used on kernel modules.

Module loading/unloading, and eBPF JIT is allowed without restrictions
for now, but we'll need a way to authenticate these code changes to
really improve the guests' security. We plan to use module signatures,
but there is no solution yet to authenticate eBPF programs.

Being able to use ftrace and Kprobes in a secure way is a challenge not
solved yet. We're looking for ideas to make this work.

Likewise, the JUMP_LABEL feature cannot work because the kernel's text
section is read-only.

Cc: Borislav Petkov <bp@...en8.de>
Cc: Dave Hansen <dave.hansen@...ux.intel.com>
Cc: H. Peter Anvin <hpa@...or.com>
Cc: Ingo Molnar <mingo@...hat.com>
Cc: Kees Cook <keescook@...omium.org>
Cc: Madhavan T. Venkataraman <madvenka@...ux.microsoft.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>
Cc: Sean Christopherson <seanjc@...gle.com>
Cc: Thomas Gleixner <tglx@...utronix.de>
Cc: Vitaly Kuznetsov <vkuznets@...hat.com>
Cc: Wanpeng Li <wanpengli@...cent.com>
Co-developed-by: Mickaël Salaün <mic@...ikod.net>
Signed-off-by: Mickaël Salaün <mic@...ikod.net>
Signed-off-by: Madhavan T. Venkataraman <madvenka@...ux.microsoft.com>
---

Changes since v1:
* New patch
---
 arch/x86/kernel/kvm.c  | 11 ++++++
 arch/x86/kvm/mmu/mmu.c |  2 +-
 arch/x86/mm/heki.c     | 21 ++++++++++
 include/linux/heki.h   | 26 ++++++++++++
 virt/heki/Kconfig      |  1 +
 virt/heki/counters.c   | 90 ++++++++++++++++++++++++++++++++++++++++--
 virt/heki/main.c       | 83 +++++++++++++++++++++++++++++++++++++-
 7 files changed, 229 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 8349f4ad3bbd..343615b0e3bf 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -1021,8 +1021,19 @@ static int kvm_lock_crs(void)
 	return err;
 }
 
+static int kvm_protect_memory(gpa_t pa)
+{
+	long err;
+
+	WARN_ON_ONCE(in_interrupt());
+
+	err = kvm_hypercall1(KVM_HC_PROTECT_MEMORY, pa);
+	return err;
+}
+
 static struct heki_hypervisor kvm_heki_hypervisor = {
 	.lock_crs = kvm_lock_crs,
+	.protect_memory = kvm_protect_memory,
 };
 
 static void kvm_init_heki(void)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 2d09bcc35462..13be05e9ccf1 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -7374,7 +7374,7 @@ bool kvm_arch_post_set_memory_attributes(struct kvm *kvm,
 	int level;
 
 	lockdep_assert_held_write(&kvm->mmu_lock);
-	lockdep_assert_held(&kvm->slots_lock);
+	lockdep_assert_held(&kvm->slots_arch_lock);
 
 	/*
 	 * The sequence matters here: upper levels consume the result of lower
diff --git a/arch/x86/mm/heki.c b/arch/x86/mm/heki.c
index e4c60d8b4f2d..6c3fa9defada 100644
--- a/arch/x86/mm/heki.c
+++ b/arch/x86/mm/heki.c
@@ -45,6 +45,19 @@ __init void heki_arch_early_init(void)
 	heki_map(direct_map_end, kernel_end);
 }
 
+void heki_arch_late_init(void)
+{
+	/*
+	 * The permission counters for all existing kernel mappings have
+	 * already been updated. Now, walk all the pages, compute their
+	 * permissions from the counters and apply the permissions in the
+	 * host page table. To accomplish this, we walk the direct map
+	 * range.
+	 */
+	heki_protect(direct_map_va, direct_map_end);
+	pr_warn("Guest memory protected\n");
+}
+
 unsigned long heki_flags_to_permissions(unsigned long flags)
 {
 	unsigned long permissions;
@@ -67,6 +80,11 @@ void heki_pgprot_to_permissions(pgprot_t prot, unsigned long *set,
 		*clear |= MEM_ATTR_EXEC;
 }
 
+unsigned long heki_default_permissions(void)
+{
+	return MEM_ATTR_READ | MEM_ATTR_WRITE;
+}
+
 static unsigned long heki_pgprot_to_flags(pgprot_t prot)
 {
 	unsigned long flags = 0;
@@ -100,6 +118,9 @@ static void heki_text_poke_common(struct page **pages, int npages,
 		heki_callback(&args);
 	}
 
+	if (args.head)
+		heki_apply_permissions(&args);
+
 	mutex_unlock(&heki_lock);
 }
 
diff --git a/include/linux/heki.h b/include/linux/heki.h
index 6f2cfddc6dac..306bcec7ae92 100644
--- a/include/linux/heki.h
+++ b/include/linux/heki.h
@@ -15,6 +15,8 @@
 #include <linux/init.h>
 #include <linux/kernel.h>
 #include <linux/printk.h>
+#include <linux/mm.h>
+#include <linux/memblock.h>
 #include <linux/slab.h>
 
 #ifdef CONFIG_HEKI
@@ -61,6 +63,7 @@ struct heki_page_list {
  */
 struct heki_hypervisor {
 	int (*lock_crs)(void); /* Lock control registers. */
+	int (*protect_memory)(gpa_t pa); /* Protect guest memory */
 };
 
 /*
@@ -74,16 +77,28 @@ struct heki_hypervisor {
  *	- a page is mapped into the kernel address space
  *	- a page is unmapped from the kernel address space
  *	- permissions are changed for a mapped page
+ *
+ * At the end of kernel boot (before kicking off the init process), the
+ * permissions for guest pages are applied to the host page table.
+ *
+ * Beyond that point, the counters and host page table permissions are updated
+ * whenever:
+ *
+ *	- a guest page is mapped into the kernel address space
+ *	- a guest page is unmapped from the kernel address space
+ *	- permissions are changed for a mapped guest page
  */
 struct heki {
 	struct heki_hypervisor *hypervisor;
 	struct mem_table *counters;
+	bool protect_memory;
 };
 
 enum heki_cmd {
 	HEKI_MAP,
 	HEKI_UPDATE,
 	HEKI_UNMAP,
+	HEKI_PROTECT_MEMORY,
 };
 
 /*
@@ -103,7 +118,12 @@ struct heki_args {
 
 	/* Permissions passed by heki_update(). */
 	unsigned long set;
+	unsigned long set_global;
 	unsigned long clear;
+
+	/* Page list is built by the callback. */
+	struct heki_page_list *head;
+	phys_addr_t head_pa;
 };
 
 /* Callback function called by the table walker. */
@@ -125,14 +145,20 @@ void heki_update(unsigned long va, unsigned long end, unsigned long set,
 		 unsigned long clear);
 void heki_unmap(unsigned long va, unsigned long end);
 void heki_callback(struct heki_args *args);
+void heki_protect(unsigned long va, unsigned long end);
+void heki_add_pa(struct heki_args *args, phys_addr_t pa,
+		 unsigned long permissions);
+void heki_apply_permissions(struct heki_args *args);
 
 /* Arch-specific functions. */
 void heki_arch_early_init(void);
+void heki_arch_late_init(void);
 unsigned long heki_flags_to_permissions(unsigned long flags);
 void heki_pgprot_to_permissions(pgprot_t prot, unsigned long *set,
 				unsigned long *clear);
 void heki_text_poke_start(struct page **pages, int npages, pgprot_t prot);
 void heki_text_poke_end(struct page **pages, int npages, pgprot_t prot);
+unsigned long heki_default_permissions(void);
 
 #else /* !CONFIG_HEKI */
 
diff --git a/virt/heki/Kconfig b/virt/heki/Kconfig
index 6d956eb9d04b..9bde84cd759e 100644
--- a/virt/heki/Kconfig
+++ b/virt/heki/Kconfig
@@ -8,6 +8,7 @@ config HEKI
 	select KVM_GENERIC_MEMORY_ATTRIBUTES
 	depends on !X86_16BIT
 	select SPARSEMEM
+	depends on !JUMP_LABEL
 	help
 	  This feature enhances guest virtual machine security by taking
 	  advantage of security features provided by the hypervisor for guests.
diff --git a/virt/heki/counters.c b/virt/heki/counters.c
index d0f830b0775a..302113bee6ff 100644
--- a/virt/heki/counters.c
+++ b/virt/heki/counters.c
@@ -13,6 +13,25 @@
 
 DEFINE_MUTEX(heki_lock);
 
+static inline unsigned long heki_permissions(struct heki_counters *counters)
+{
+	unsigned long permissions;
+
+	if (!counters)
+		return heki_default_permissions();
+
+	permissions = 0;
+	if (counters->read)
+		permissions |= MEM_ATTR_READ;
+	if (counters->write)
+		permissions |= MEM_ATTR_WRITE;
+	if (counters->execute)
+		permissions |= MEM_ATTR_EXEC;
+	if (!permissions)
+		permissions = heki_default_permissions();
+	return permissions;
+}
+
 static void heki_update_counters(struct heki_counters *counters,
 				 unsigned long perm, unsigned long set,
 				 unsigned long clear)
@@ -53,20 +72,38 @@ static struct heki_counters *heki_create_counters(struct mem_table *table,
 	return counters;
 }
 
+static void heki_check_counters(struct heki_counters *counters,
+				unsigned long permissions)
+{
+	/*
+	 * If a permission has been added to a PTE directly, it will not be
+	 * reflected in the counters. Adjust for that. This is a bit of a
+	 * hack, really.
+	 */
+	if ((permissions & MEM_ATTR_READ) && !counters->read)
+		counters->read++;
+	if ((permissions & MEM_ATTR_WRITE) && !counters->write)
+		counters->write++;
+	if ((permissions & MEM_ATTR_EXEC) && !counters->execute)
+		counters->execute++;
+}
+
 void heki_callback(struct heki_args *args)
 {
 	/* The VA is only for debug. It is not really used in this function. */
 	unsigned long va;
 	phys_addr_t pa, pa_end;
-	unsigned long permissions;
+	unsigned long permissions, existing, new;
 	void **entry;
 	struct heki_counters *counters;
 	unsigned int ignore;
+	bool protect_memory;
 
 	if (!pfn_valid(args->pa >> PAGE_SHIFT))
 		return;
 
 	permissions = heki_flags_to_permissions(args->flags);
+	protect_memory = heki.protect_memory;
 
 	/*
 	 * Handle counters for a leaf entry in the kernel page table.
@@ -80,6 +117,8 @@ void heki_callback(struct heki_args *args)
 		else
 			counters = NULL;
 
+		existing = heki_permissions(counters);
+
 		switch (args->cmd) {
 		case HEKI_MAP:
 			if (!counters)
@@ -102,10 +141,30 @@ void heki_callback(struct heki_args *args)
 					     permissions);
 			break;
 
+		case HEKI_PROTECT_MEMORY:
+			if (counters)
+				heki_check_counters(counters, permissions);
+			existing = 0;
+			break;
+
 		default:
 			WARN_ON_ONCE(1);
 			break;
 		}
+
+		new = heki_permissions(counters) | args->set_global;
+
+		/*
+		 * To be able to use a pool of allocated memory for new
+		 * executable or read-only mappings (e.g., kernel module
+		 * loading), ignores immutable attribute if memory can be
+		 * changed.
+		 */
+		if (new & MEM_ATTR_WRITE)
+			new &= ~MEM_ATTR_IMMUTABLE;
+
+		if (protect_memory && existing != new)
+			heki_add_pa(args, pa, new);
 	}
 }
 
@@ -120,14 +179,20 @@ static void heki_func(unsigned long va, unsigned long end,
 
 	mutex_lock(&heki_lock);
 
+	if (args->cmd == HEKI_PROTECT_MEMORY)
+		heki.protect_memory = true;
+
 	heki_walk(va, end, heki_callback, args);
 
+	if (args->head)
+		heki_apply_permissions(args);
+
 	mutex_unlock(&heki_lock);
 }
 
 /*
  * Find the mappings in the given range and initialize permission counters for
- * them.
+ * them. Apply permissions in the host page table.
  */
 void heki_map(unsigned long va, unsigned long end)
 {
@@ -138,6 +203,25 @@ void heki_map(unsigned long va, unsigned long end)
 	heki_func(va, end, &args);
 }
 
+/*
+ * The architecture calls this to protect all guest pages at the end of
+ * kernel init. Up to this point, only the counters for guest pages have been
+ * updated. No permissions have been applied on the host page table.
+ * Now, the permissions will be applied.
+ *
+ * Beyond this point, the host page table permissions will always be updated
+ * whenever the counters are updated.
+ */
+void heki_protect(unsigned long va, unsigned long end)
+{
+	struct heki_args args = {
+		.cmd = HEKI_PROTECT_MEMORY,
+		.set_global = MEM_ATTR_IMMUTABLE,
+	};
+
+	heki_func(va, end, &args);
+}
+
 /*
  * Find the mappings in the given range and update permission counters for
  * them. Apply permissions in the host page table.
@@ -156,7 +240,7 @@ void heki_update(unsigned long va, unsigned long end, unsigned long set,
 
 /*
  * Find the mappings in the given range and revert the permission counters for
- * them.
+ * them. Apply permissions in the host page table.
  */
 void heki_unmap(unsigned long va, unsigned long end)
 {
diff --git a/virt/heki/main.c b/virt/heki/main.c
index 0ab7de659e6f..5629334112e7 100644
--- a/virt/heki/main.c
+++ b/virt/heki/main.c
@@ -51,7 +51,7 @@ void heki_late_init(void)
 {
 	struct heki_hypervisor *hypervisor = heki.hypervisor;
 
-	if (!heki_enabled || !heki.hypervisor)
+	if (!heki.counters)
 		return;
 
 	/* Locks control registers so a compromised guest cannot change them. */
@@ -59,6 +59,87 @@ void heki_late_init(void)
 		return;
 
 	pr_warn("Control registers locked\n");
+
+	heki_arch_late_init();
+}
+
+/*
+ * Build a list of guest pages with their permissions. This list will be
+ * passed to the VMM/Hypervisor to set these permissions in the host page
+ * table.
+ */
+void heki_add_pa(struct heki_args *args, phys_addr_t pa,
+		 unsigned long permissions)
+{
+	struct heki_page_list *list = args->head;
+	struct heki_pages *hpage;
+	u64 max_pages;
+	struct page *page;
+	bool new = false;
+
+	max_pages = (PAGE_SIZE - sizeof(*list)) / sizeof(*hpage);
+again:
+	if (!list || list->npages == max_pages) {
+		page = alloc_page(GFP_KERNEL);
+		if (WARN_ON_ONCE(!page))
+			return;
+
+		list = page_address(page);
+		list->npages = 0;
+		list->next_pa = args->head_pa;
+		list->next = args->head;
+
+		args->head = list;
+		args->head_pa = page_to_pfn(page) << PAGE_SHIFT;
+		new = true;
+	}
+
+	hpage = &list->pages[list->npages];
+	if (new) {
+		hpage->pa = pa;
+		hpage->epa = pa + PAGE_SIZE;
+		hpage->permissions = permissions;
+		return;
+	}
+
+	if (pa == hpage->epa && permissions == hpage->permissions) {
+		hpage->epa += PAGE_SIZE;
+		return;
+	}
+
+	list->npages++;
+	new = true;
+	goto again;
+}
+
+void heki_apply_permissions(struct heki_args *args)
+{
+	struct heki_hypervisor *hypervisor = heki.hypervisor;
+	struct heki_page_list *list = args->head;
+	phys_addr_t list_pa = args->head_pa;
+	struct page *page;
+	int ret;
+
+	if (!list)
+		return;
+
+	/* The very last one must be included. */
+	list->npages++;
+
+	/* Protect guest memory in the host page table. */
+	ret = hypervisor->protect_memory(list_pa);
+	if (ret) {
+		pr_warn("Failed to set memory permission\n");
+		return;
+	}
+
+	/* Free all the pages in the page list. */
+	while (list) {
+		page = pfn_to_page(list_pa >> PAGE_SHIFT);
+		list_pa = list->next_pa;
+		list = list->next;
+		__free_pages(page, 0);
+	}
 }
 
 static int __init heki_parse_config(char *str)
-- 
2.42.1


Powered by blists - more mailing lists