lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130111213550.GA31149@phenom.dumpdata.com>
Date:	Fri, 11 Jan 2013 16:35:50 -0500
From:	Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>
To:	Matt Wilson <msw@...zon.com>
Cc:	stable@...r.kernel.org, Ian Campbell <Ian.Campbell@...rix.com>,
	Annie Li <annie.li@...cle.com>, xen-devel@...ts.xen.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] xen/grant-table: correctly initialize grant table
 version 1

On Sun, Jan 06, 2013 at 11:14:42AM +0000, Matt Wilson wrote:
> Commit 85ff6acb075a484780b3d763fdf41596d8fc0970 (xen/granttable: Grant
> tables V2 implementation) changed the GREFS_PER_GRANT_FRAME macro from
> a constant to a conditional expression. The expression depends on
> grant_table_version being appropriately set. Unfortunately, at init
> time grant_table_version will be 0. The GREFS_PER_GRANT_FRAME
> conditional expression checks for "grant_table_version == 1", and
> therefore returns the number of grant references per frame for v2.
> 
> This causes gnttab_init() to allocate fewer pages for gnttab_list, as
> a frame can old half the number of v2 entries than v1 entries. After
> gnttab_resume() is called, grant_table_version is appropriately
> set. nr_init_grefs will then be miscalculated and gnttab_free_count
> will hold a value larger than the actual number of free gref entries.
> 
> If a guest is heavily utilizing improperly initialized v1 grant
> tables, memory corruption can occur. One common manifestation is
> corruption of the vmalloc list, resulting in a poisoned pointer
> derefrence when accessing /proc/meminfo or /proc/vmallocinfo:
> 
> [   40.770064] BUG: unable to handle kernel paging request at 0000200200001407
> [   40.770083] IP: [<ffffffff811a6fb0>] get_vmalloc_info+0x70/0x110
> [   40.770102] PGD 0
> [   40.770107] Oops: 0000 [#1] SMP
> [   40.770114] CPU 10

With this patch I get:

[    2.555087] GHES: HEST is not enabled!
[    2.560522] ioatdma: Intel(R) QuickData Technology Driver 4.00
[    2.569651] ------------[ cut here ]------------
[    2.570529] kernel BUG at /home/konrad/linux/drivers/xen/grant-table.c:1175!
[    2.570529] invalid opcode: 0000 [#1] SMP 
[    2.570529] Modules linked in:
[    2.570529] Pid: 1, comm: swapper/0 Not tainted 3.8.0-rc3upstream-00236-g8a3fed1 #1 Xen HVM domU
[    2.570529] EIP: 0060:[<c1330668>] EFLAGS: 00010246 CPU: 1
[    2.570529] EIP is at gnttab_init+0x1b8/0x1c0
[    2.570529] EAX: 00000000 EBX: 00000006 ECX: f2479e24 EDX: 00000020
[    2.570529] ESI: f25a5800 EDI: 00000000 EBP: f2479e40 ESP: f2479e1c
[    2.570529]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[    2.570529] CR0: 8005003b CR2: 00000000 CR3: 017fb000 CR4: 000006f0
[    2.570529] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[    2.570529] DR6: ffff0ff0 DR7: 00000400
[    2.570529] Process swapper/0 (pid: 1, ti=f2478000 task=f2462a70 task.ti=f2478000)
[    2.570529] Stack:
[    2.570529]  000000ff c132f0e7 f25a7ff0 00000004 00000020 00000000 00000000 f25a5800
[    2.570529]  00000000 f2479e6c c133bcec 00000000 f2479e70 c10a945f 01000000 f2479e60
[    2.570529]  f2000000 f25a5800 c1744700 c1744734 f2479e78 c12b5284 f25a5800 f2479e94
[    2.570529] Call Trace:
[    2.570529]  [<c132f0e7>] ? gnttab_max_grant_frames+0x27/0x60
[    2.570529]  [<c133bcec>] platform_pci_init+0x1ac/0x260
[    2.570529]  [<c10a945f>] ? __blocking_notifier_call_chain+0x4f/0x60
[    2.570529]  [<c12b5284>] local_pci_probe+0x14/0x30
[    2.570529]  [<c12b63b8>] pci_device_probe+0x58/0x70
[    2.570529]  [<c139c8a9>] driver_probe_device+0x69/0x1e0
[    2.570529]  [<c12b555e>] ? pci_match_device+0xbe/0xd0
[    2.570529]  [<c139ca99>] __driver_attach+0x79/0x80
[    2.570529]  [<c139b348>] bus_for_each_dev+0x48/0x70
[    2.570529]  [<c139c739>] driver_attach+0x19/0x20
[    2.570529]  [<c139ca20>] ? driver_probe_device+0x1e0/0x1e0
[    2.570529]  [<c139c1b4>] bus_add_driver+0xa4/0x220
[    2.570529]  [<c12b6300>] ? pci_dev_put+0x20/0x20
[    2.570529]  [<c12b6300>] ? pci_dev_put+0x20/0x20
[    2.570529]  [<c139d025>] driver_register+0x65/0x130
[    2.570529]  [<c12b646e>] __pci_register_driver+0x2e/0x40
[    2.570529]  [<c1796d92>] platform_pci_module_init+0x14/0x16
[    2.570529]  [<c100302f>] do_one_initcall+0x2f/0x170
[    2.570529]  [<c1796d7e>] ? hyper_sysfs_init+0xc4/0xc4
[    2.570529]  [<c1576e12>] kernel_init+0x142/0x2a0
[    2.570529]  [<c176b442>] ? parse_early_options+0x35/0x35
[    2.570529]  [<c158e477>] ret_from_kernel_thread+0x1b/0x28
[    2.570529]  [<c1576cd0>] ? rest_init+0x80/0x80
[    2.570529] Code: ff ff ff 89 15 98 54 89 c1 c7 05 9c 54 89 c1 08 00 00 00 c7 04 24 4e 22 6c c1 e8 b9 48 25 00 c7 45 e0 00 00 00 00 e9 58 ff ff ff <0f> 0b eb fe 8d 74 26 00 8b 15 00 e1 7f c1 31 c0 55 89 e5 83 fa
[    2.570529] EIP: [<c1330668>] gnttab_init+0x1b8/0x1c0 SS:ESP 0068:f2479e1c
[    2.922510] ---[ end trace 14cd1ffa03240f7d ]---
[    2.928972] swapper/0 (1) used greatest stack depth: 5196 bytes left
[    2.935947] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[    2.935947] 

This with a 64-bit dom0 and 64-bit and 32-bit PVHVM.

I had to fix a merge conflict with this patch so perhaps I missed something:

(The tree is #linux-next-test-3.8-rc3 on my xen.git tree);

commit 31d1172faa705443aec03915bdc809ec6a8ebdd7
Author: Matt Wilson <msw@...zon.com>
Date:   Sun Jan 6 11:14:42 2013 +0000

    xen/grant-table: correctly initialize grant table version 1
    
    Commit 85ff6acb075a484780b3d763fdf41596d8fc0970 (xen/granttable: Grant
    tables V2 implementation) changed the GREFS_PER_GRANT_FRAME macro from
    a constant to a conditional expression. The expression depends on
    grant_table_version being appropriately set. Unfortunately, at init
    time grant_table_version will be 0. The GREFS_PER_GRANT_FRAME
    conditional expression checks for "grant_table_version == 1", and
    therefore returns the number of grant references per frame for v2.
    
    This causes gnttab_init() to allocate fewer pages for gnttab_list, as
    a frame can old half the number of v2 entries than v1 entries. After
    gnttab_resume() is called, grant_table_version is appropriately
    set. nr_init_grefs will then be miscalculated and gnttab_free_count
    will hold a value larger than the actual number of free gref entries.
    
    If a guest is heavily utilizing improperly initialized v1 grant
    tables, memory corruption can occur. One common manifestation is
    corruption of the vmalloc list, resulting in a poisoned pointer
    derefrence when accessing /proc/meminfo or /proc/vmallocinfo:
    
    [   40.770064] BUG: unable to handle kernel paging request at 0000200200001407
    [   40.770083] IP: [<ffffffff811a6fb0>] get_vmalloc_info+0x70/0x110
    [   40.770102] PGD 0
    [   40.770107] Oops: 0000 [#1] SMP
    [   40.770114] CPU 10
    
    This patch introduces a static variable, grefs_per_grant_frame, to
    cache the calculated value. gnttab_init() now calls
    gnttab_request_version() early so that grant_table_version and
    grefs_per_grant_frame can be appropriately set. A few BUG_ON()s have
    been added to prevent this type of bug from reoccurring in the future.
    
    Signed-off-by: Matt Wilson <msw@...zon.com>
    Reviewed-and-Tested-by: Steven Noonan <snoonan@...zon.com>
    Acked-by: Ian Campbell <Ian.Campbell@...rix.com>
    Cc: Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>
    Cc: Annie Li <annie.li@...cle.com>
    Cc: xen-devel@...ts.xen.org
    Cc: linux-kernel@...r.kernel.org
    Cc: stable@...r.kernel.org # v3.3 and newer
    Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>
    [v1: Fixed merge conflict]

diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index b91f14e..d49c9fb 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -56,10 +56,6 @@
 /* External tools reserve first few grant table entries. */
 #define NR_RESERVED_ENTRIES 8
 #define GNTTAB_LIST_END 0xffffffff
-#define GREFS_PER_GRANT_FRAME \
-(grant_table_version == 1 ?                      \
-(PAGE_SIZE / sizeof(struct grant_entry_v1)) :   \
-(PAGE_SIZE / sizeof(union grant_entry_v2)))
 
 static grant_ref_t **gnttab_list;
 static unsigned int nr_grant_frames;
@@ -154,6 +150,7 @@ static struct gnttab_ops *gnttab_interface;
 static grant_status_t *grstatus;
 
 static int grant_table_version;
+static int grefs_per_grant_frame;
 
 static struct gnttab_free_callback *gnttab_free_callback_list;
 
@@ -767,12 +764,14 @@ static int grow_gnttab_list(unsigned int more_frames)
 	unsigned int new_nr_grant_frames, extra_entries, i;
 	unsigned int nr_glist_frames, new_nr_glist_frames;
 
+	BUG_ON(grefs_per_grant_frame == 0);
+
 	new_nr_grant_frames = nr_grant_frames + more_frames;
-	extra_entries       = more_frames * GREFS_PER_GRANT_FRAME;
+	extra_entries       = more_frames * grefs_per_grant_frame;
 
-	nr_glist_frames = (nr_grant_frames * GREFS_PER_GRANT_FRAME + RPP - 1) / RPP;
+	nr_glist_frames = (nr_grant_frames * grefs_per_grant_frame + RPP - 1) / RPP;
 	new_nr_glist_frames =
-		(new_nr_grant_frames * GREFS_PER_GRANT_FRAME + RPP - 1) / RPP;
+		(new_nr_grant_frames * grefs_per_grant_frame + RPP - 1) / RPP;
 	for (i = nr_glist_frames; i < new_nr_glist_frames; i++) {
 		gnttab_list[i] = (grant_ref_t *)__get_free_page(GFP_ATOMIC);
 		if (!gnttab_list[i])
@@ -780,12 +779,12 @@ static int grow_gnttab_list(unsigned int more_frames)
 	}
 
 
-	for (i = GREFS_PER_GRANT_FRAME * nr_grant_frames;
-	     i < GREFS_PER_GRANT_FRAME * new_nr_grant_frames - 1; i++)
+	for (i = grefs_per_grant_frame * nr_grant_frames;
+	     i < grefs_per_grant_frame * new_nr_grant_frames - 1; i++)
 		gnttab_entry(i) = i + 1;
 
 	gnttab_entry(i) = gnttab_free_head;
-	gnttab_free_head = GREFS_PER_GRANT_FRAME * nr_grant_frames;
+	gnttab_free_head = grefs_per_grant_frame * nr_grant_frames;
 	gnttab_free_count += extra_entries;
 
 	nr_grant_frames = new_nr_grant_frames;
@@ -957,7 +956,8 @@ EXPORT_SYMBOL_GPL(gnttab_unmap_refs);
 
 static unsigned nr_status_frames(unsigned nr_grant_frames)
 {
-	return (nr_grant_frames * GREFS_PER_GRANT_FRAME + SPP - 1) / SPP;
+	BUG_ON(grefs_per_grant_frame == 0);
+	return (nr_grant_frames * grefs_per_grant_frame + SPP - 1) / SPP;
 }
 
 static int gnttab_map_frames_v1(xen_pfn_t *frames, unsigned int nr_gframes)
@@ -1115,6 +1115,7 @@ static void gnttab_request_version(void)
 	rc = HYPERVISOR_grant_table_op(GNTTABOP_set_version, &gsv, 1);
 	if (rc == 0 && gsv.version == 2) {
 		grant_table_version = 2;
+		grefs_per_grant_frame = PAGE_SIZE / sizeof(union grant_entry_v2);
 		gnttab_interface = &gnttab_v2_ops;
 	} else if (grant_table_version == 2) {
 		/*
@@ -1129,15 +1130,15 @@ static void gnttab_request_version(void)
 		grant_table_version = 1;
 		gnttab_interface = &gnttab_v1_ops;
 	}
-	printk(KERN_INFO "Grant tables using version %d layout.\n",
-		grant_table_version);
 }
 
-int gnttab_resume(void)
+static int gnttab_setup(void)
 {
 	unsigned int max_nr_gframes;
 
-	gnttab_request_version();
+	printk(KERN_INFO "Grant tables using version %d layout.\n",
+		grant_table_version);
+
 	max_nr_gframes = gnttab_max_grant_frames();
 	if (max_nr_gframes < nr_grant_frames)
 		return -ENOSYS;
@@ -1160,6 +1161,12 @@ int gnttab_resume(void)
 	return 0;
 }
 
+int gnttab_resume(void)
+{
+	gnttab_request_version();
+	return gnttab_setup();
+}
+
 int gnttab_suspend(void)
 {
 	gnttab_interface->unmap_frames();
@@ -1171,9 +1178,10 @@ static int gnttab_expand(unsigned int req_entries)
 	int rc;
 	unsigned int cur, extra;
 
+	BUG_ON(grefs_per_grant_frame == 0);
 	cur = nr_grant_frames;
-	extra = ((req_entries + (GREFS_PER_GRANT_FRAME-1)) /
-		 GREFS_PER_GRANT_FRAME);
+	extra = ((req_entries + (grefs_per_grant_frame-1)) /
+		 grefs_per_grant_frame);
 	if (cur + extra > gnttab_max_grant_frames())
 		return -ENOSPC;
 
@@ -1191,21 +1199,23 @@ int gnttab_init(void)
 	unsigned int nr_init_grefs;
 	int ret;
 
+	gnttab_request_version();
 	nr_grant_frames = 1;
 	boot_max_nr_grant_frames = __max_nr_grant_frames();
 
 	/* Determine the maximum number of frames required for the
 	 * grant reference free list on the current hypervisor.
 	 */
+	BUG_ON(grefs_per_grant_frame == 0);
 	max_nr_glist_frames = (boot_max_nr_grant_frames *
-			       GREFS_PER_GRANT_FRAME / RPP);
+			       grefs_per_grant_frame / RPP);
 
 	gnttab_list = kmalloc(max_nr_glist_frames * sizeof(grant_ref_t *),
 			      GFP_KERNEL);
 	if (gnttab_list == NULL)
 		return -ENOMEM;
 
-	nr_glist_frames = (nr_grant_frames * GREFS_PER_GRANT_FRAME + RPP - 1) / RPP;
+	nr_glist_frames = (nr_grant_frames * grefs_per_grant_frame + RPP - 1) / RPP;
 	for (i = 0; i < nr_glist_frames; i++) {
 		gnttab_list[i] = (grant_ref_t *)__get_free_page(GFP_KERNEL);
 		if (gnttab_list[i] == NULL) {
@@ -1214,12 +1224,12 @@ int gnttab_init(void)
 		}
 	}
 
-	if (gnttab_resume() < 0) {
+	if (gnttab_setup() < 0) {
 		ret = -ENODEV;
 		goto ini_nomem;
 	}
 
-	nr_init_grefs = nr_grant_frames * GREFS_PER_GRANT_FRAME;
+	nr_init_grefs = nr_grant_frames * grefs_per_grant_frame;
 
 	for (i = NR_RESERVED_ENTRIES; i < nr_init_grefs - 1; i++)
 		gnttab_entry(i) = i + 1;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ