lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Sun, 07 Dec 2008 03:14:17 +0100
From:	Brice Goglin <Brice.Goglin@...ia.fr>
To:	Andrew Morton <akpm@...l.org>,
	Christoph Lameter <cl@...ux-foundation.org>
CC:	LKML <linux-kernel@...r.kernel.org>
Subject: [RFC/PATCH] No get_user/put_user while holding mmap_sem in do_pages_stat?

Hello,

I have been seeing some deadlocks that seem to be related to do_pages_stat()
page-faulting while holding the mmap_sem. Part of my sys_move_pages() rework
has been applied to 2.6.28-rc. So do_pages_stat() now gets page addresses
from user-space (and puts the result back to user-space) while holding the
mmap_sem for read. If there's a page-fault there, the page-fault handler
grabs the mmap_sem for read again. But if another thread took it for write
in the meantime (for instance in mprotect), it deadlocks since rwsem readers
are blocked if a writer is already waiting.

Reading the archives, I see some similar deadlocks a couple years ago but
I can't find the final answer/fix. From what I understand, the mmap_sem
fairness could not be changed easily. So I am not sure whether accessing
user-space while holding mmap_sem for read is still valid/recommended today.
But the behavior of do_pages_stat() changed between 2.6.27 and 2.6.28-rc
because of my patch, and this deadlock seems to be happening for real.
So I would like to fix this small regression in 2.6.28. The patch below seems
to make my do_pages_stat() deadlock disappear here.

Brice



[PATCH] No get_user/put_user while holding mmap_sem in do_pages_stat
   
Since commit 2f007e74bb85b9fc4eab28524052161703300f1a, do_pages_stat()
gets the page address from user-space and puts the corresponding status
back while holding the mmap_sem for read. There is no need to hold
mmap_sem there while some page-faults may occur.

This patch adds a temporary address and status buffer so as to only hold
mmap_sem while working on these kernel buffers. This is implemented by
extracting do_pages_stat_array() out of do_pages_stat().

Signed-off-by: Brice Goglin <Brice.Goglin@...ia.fr>

diff --git a/mm/migrate.c b/mm/migrate.c
index 1e0d6b2..4350101 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -987,25 +987,18 @@ out:
 /*
  * Determine the nodes of an array of pages and store it in an array of status.
  */
-static int do_pages_stat(struct mm_struct *mm, unsigned long nr_pages,
-			 const void __user * __user *pages,
-			 int __user *status)
+static void do_pages_stat_array(struct mm_struct *mm, unsigned long nr_pages,
+				const void * __user *pages, int *status)
 {
 	unsigned long i;
-	int err;
 
 	down_read(&mm->mmap_sem);
 
-	for (i = 0; i < nr_pages; i++) {
-		const void __user *p;
-		unsigned long addr;
+	for (i = 0; i < nr_pages; i++, pages++, status++) {
+		unsigned long addr = (unsigned long)(*pages);
 		struct vm_area_struct *vma;
 		struct page *page;
-
-		err = -EFAULT;
-		if (get_user(p, pages+i))
-			goto out;
-		addr = (unsigned long) p;
+		int err;
 
 		vma = find_vma(mm, addr);
 		if (!vma)
@@ -1024,12 +1017,59 @@ static int do_pages_stat(struct mm_struct *mm, unsigned long nr_pages,
 
 		err = page_to_nid(page);
 set_status:
-		put_user(err, status+i);
+		*status = err;
+	}
+
+	up_read(&mm->mmap_sem);
+}
+
+/*
+ * Determine the nodes of a user array of pages and store it in
+ * a user array of status.
+ */
+static int do_pages_stat(struct mm_struct *mm, unsigned long nr_pages,
+			 const void __user * __user *pages,
+			 int __user *status)
+{
+	const void * __user *chunk_pages;
+	int *chunk_status;
+	unsigned long i,chunk_nr;
+	int err;
+
+	err = -ENOMEM;
+	chunk_pages = (const void * __user *)__get_free_page(GFP_KERNEL);
+	if (!chunk_pages)
+		goto out;
+	chunk_status = (int *)__get_free_page(GFP_KERNEL);
+	if (!chunk_status)
+		goto out_with_chunk_pages;
+
+	chunk_nr = PAGE_SIZE/max(sizeof(*chunk_pages), sizeof(*chunk_status));
+	for(i = 0; i < nr_pages; i += chunk_nr, pages += chunk_nr, status += chunk_nr) {
+		if (chunk_nr + i > nr_pages)
+			chunk_nr = nr_pages - i;
+
+		err = copy_from_user(chunk_pages, pages, chunk_nr * sizeof(*chunk_pages));
+		if (err) {
+			err = -EFAULT;
+			goto out_with_chunk_status;
+		}
+
+		do_pages_stat_array(mm, chunk_nr, chunk_pages, chunk_status);
+
+		err = copy_to_user(status, chunk_status, chunk_nr * sizeof(*chunk_status));
+		if (err) {
+			err = -EFAULT;
+			goto out_with_chunk_status;
+		}
 	}
 	err = 0;
 
+out_with_chunk_status:
+	free_page((unsigned long)chunk_status);
+out_with_chunk_pages:
+	free_page((unsigned long)chunk_pages);
 out:
-	up_read(&mm->mmap_sem);
 	return err;
 }
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists