lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1333139850-28456-6-git-send-email-konrad.wilk@oracle.com>
Date:	Fri, 30 Mar 2012 16:37:28 -0400
From:	Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>
To:	linux-kernel@...r.kernel.org, xen-devel@...ts.xensource.com
Cc:	Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>
Subject: [PATCH 5/7] xen/setup: Transfer MFNs from non-RAM E820 entries and gaps to E820 RAM

When the Xen hypervisor boots a PV kernel it hands it two pieces
of information: nr_pages and a made up E820 entry.

The nr_pages value defines the range from zero to nr_pages of PFNs
which have a valid Machine Frame Number (MFN) underneath it. The
E820 mirrors that (with the VGA hole):
BIOS-provided physical RAM map:
 Xen: 0000000000000000 - 00000000000a0000 (usable)
 Xen: 00000000000a0000 - 0000000000100000 (reserved)
 Xen: 0000000000100000 - 0000000080800000 (usable)

The fun comes when a PV guest that is run with a system E820 - that
can either be the initial domain or a PCI PV guest, where the E820
looks like the normal thing:

BIOS-provided physical RAM map:
 Xen: 0000000000000000 - 000000000009e000 (usable)
 Xen: 000000000009ec00 - 0000000000100000 (reserved)
 Xen: 0000000000100000 - 0000000020000000 (usable)
 Xen: 0000000020000000 - 0000000020200000 (reserved)
 Xen: 0000000020200000 - 0000000040000000 (usable)
 Xen: 0000000040000000 - 0000000040200000 (reserved)
 Xen: 0000000040200000 - 00000000bad80000 (usable)
 Xen: 00000000bad80000 - 00000000badc9000 (ACPI NVS)
..
With that overlaying the nr_pages directly on the E820 does not
work as there are gaps and non-RAM regions that won't be used
by the memory allocator. The 'xen_release_chunk' helps with that
by punching holes in the P2M (PFN to MFN lookup tree) for those
regions and tells us that:

Freeing  20000-20200 pfn range: 512 pages freed
Freeing  40000-40200 pfn range: 512 pages freed
Freeing  bad80-badf4 pfn range: 116 pages freed
Freeing  badf6-bae7f pfn range: 137 pages freed
Freeing  bb000-100000 pfn range: 282624 pages freed
Released 283999 pages of unused memory

Those 283999 pages are subtracted from the nr_pages and are returned
to the hypervisor. The end result is that the initial domain
boots with 1GB less memory as the nr_pages has been subtraced by
the amount of pages residing within the PCI hole. It can balloon up
to that if desired using 'xl mem-set 0 8092', but the balloon driver
is not always compiled in for the initial domain.

The 'xen_exchange_chunk' solves this by transfering the
MFNs that would have been freed to the E820_RAM entries that
are past the nr_pages by using the early_set_phys_to_machine
mechanism that allows the P2M tree to allocate new leafs during
early bootup.

It does that by copying the MFNs to the E820_RAM that has not
been used and setting the old PFNs to INVALID_P2M_ENTRY.

The end result is that the kernel can now boot with the
nr_pages without having to subtract the 283999 pages.

We will now get:

-Released 283999 pages of unused memory
+Exchanged 283999 pages
.. snip..
-Memory: 6487732k/9208688k available (5817k kernel code, 1136060k absent, 1584896k reserved, 2900k data, 692k init)
+Memory: 6503888k/8072692k available (5817k kernel code, 1136060k absent, 432744k reserved, 2900k data, 692k init)

which is more in line with classic XenOLinux.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>
---
 arch/x86/xen/setup.c |   85 ++++++++++++++++++++++++++++++++++++++++++++++++--
 1 files changed, 82 insertions(+), 3 deletions(-)

diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 1ba8dff..2a12143 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -120,12 +120,89 @@ static unsigned long __init xen_release_chunk(unsigned long start,
 	return len;
 }
 
+static unsigned long __init xen_exchange_chunk(unsigned long start_pfn,
+	unsigned long end_pfn, unsigned long nr_pages, unsigned long exchanged,
+	unsigned long *pages_left, const struct e820entry *list,
+	size_t map_size)
+{
+	const struct e820entry *entry;
+	unsigned int i;
+	unsigned long credits = (end_pfn - start_pfn) + *pages_left;
+	unsigned long done = 0;
+
+	for (i = 0, entry = list; i < map_size; i++, entry++) {
+		unsigned long s_pfn;
+		unsigned long e_pfn;
+		unsigned long pfn;
+		unsigned long dest_pfn;
+		long nr;
+
+		if (credits == 0)
+			break;
+
+		if (entry->type != E820_RAM)
+			continue;
+
+		e_pfn = PFN_UP(entry->addr + entry->size);
+
+		/* We only care about E820 _after_ the xen_start_info->nr_pages */
+		if (e_pfn <= nr_pages)
+			continue;
+
+		s_pfn = PFN_DOWN(entry->addr);
+		/* If the E820 falls within the nr_pages, we want to start
+		 * at the nr_pages PFN (plus whatever we already had exchanged)
+		 * If that would mean going past the E820 entry, skip it
+		 */
+		if (s_pfn <= nr_pages) {
+			nr = e_pfn - exchanged - nr_pages;
+			dest_pfn = nr_pages + exchanged;
+		} else {
+			nr = e_pfn - exchanged - s_pfn;
+			dest_pfn = s_pfn + exchanged;
+		}
+		/* If we had filled this E820_RAM entry, go to the next one. */
+		if (nr <= 0)
+			continue;
+
+		pr_debug("[%lx->%lx] (starting at %lx and have space for %ld pages) will move %ld pages from [%lx->%lx]\n",
+			 s_pfn, e_pfn, dest_pfn, nr, credits, start_pfn, end_pfn);
+
+		for (pfn = start_pfn; pfn < start_pfn + nr; pfn++) {
+			unsigned long mfn = pfn_to_mfn(pfn);
+
+			if (mfn == INVALID_P2M_ENTRY || mfn_to_pfn(mfn) != pfn)
+				break;
+
+			if (!early_set_phys_to_machine(dest_pfn, mfn))
+				break;
+
+			/* You would think we should do HYPERVISOR_update_va_mapping
+			 * but we don't need to as the hypervisor only sets up the
+			 * initial pagetables up to nr_pages, and we stick the MFNs
+			 * past that.
+			 */
+			__set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
+			++dest_pfn;
+			++done;
+			if (--credits == 0)
+				break;
+		}
+	}
+	if (done)
+		printk(KERN_INFO "Transfered from %lx->%lx range %ld pages\n", start_pfn, end_pfn, done);
+	/* How many left on the next iteration */
+	*pages_left = credits;
+	return done;
+}
 static unsigned long __init xen_set_identity_and_release(
 	const struct e820entry *list, size_t map_size, unsigned long nr_pages)
 {
 	phys_addr_t start = 0;
 	unsigned long released = 0;
 	unsigned long identity = 0;
+	unsigned long exchanged = 0;
+	unsigned long credits = 0;
 	const struct e820entry *entry;
 	int i;
 
@@ -151,17 +228,19 @@ static unsigned long __init xen_set_identity_and_release(
 				end_pfn = PFN_UP(entry->addr);
 
 			if (start_pfn < end_pfn) {
-				if (start_pfn < nr_pages)
+				exchanged += xen_exchange_chunk(start_pfn, end_pfn, nr_pages,
+						exchanged, &credits, list, map_size);
+				if (start_pfn < nr_pages) {
 					released += xen_release_chunk(
 						start_pfn, min(end_pfn, nr_pages));
-
+				}
 				identity += set_phys_range_identity(
 					start_pfn, end_pfn);
 			}
 			start = end;
 		}
 	}
-
+	printk(KERN_INFO "Exchanged %lu pages\n", exchanged);
 	printk(KERN_INFO "Released %lu pages of unused memory\n", released);
 	printk(KERN_INFO "Set %ld page(s) to 1-1 mapping\n", identity);
 
-- 
1.7.7.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ