lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Sun, 07 Dec 2008 03:14:17 +0100 From: Brice Goglin <Brice.Goglin@...ia.fr> To: Andrew Morton <akpm@...l.org>, Christoph Lameter <cl@...ux-foundation.org> CC: LKML <linux-kernel@...r.kernel.org> Subject: [RFC/PATCH] No get_user/put_user while holding mmap_sem in do_pages_stat? Hello, I have been seeing some deadlocks that seem to be related to do_pages_stat() page-faulting while holding the mmap_sem. Part of my sys_move_pages() rework has been applied to 2.6.28-rc. So do_pages_stat() now gets page addresses from user-space (and puts the result back to user-space) while holding the mmap_sem for read. If there's a page-fault there, the page-fault handler grabs the mmap_sem for read again. But if another thread took it for write in the meantime (for instance in mprotect), it deadlocks since rwsem readers are blocked if a writer is already waiting. Reading the archives, I see some similar deadlocks a couple years ago but I can't find the final answer/fix. From what I understand, the mmap_sem fairness could not be changed easily. So I am not sure whether accessing user-space while holding mmap_sem for read is still valid/recommended today. But the behavior of do_pages_stat() changed between 2.6.27 and 2.6.28-rc because of my patch, and this deadlock seems to be happening for real. So I would like to fix this small regression in 2.6.28. The patch below seems to make my do_pages_stat() deadlock disappear here. Brice [PATCH] No get_user/put_user while holding mmap_sem in do_pages_stat Since commit 2f007e74bb85b9fc4eab28524052161703300f1a, do_pages_stat() gets the page address from user-space and puts the corresponding status back while holding the mmap_sem for read. There is no need to hold mmap_sem there while some page-faults may occur. This patch adds a temporary address and status buffer so as to only hold mmap_sem while working on these kernel buffers. This is implemented by extracting do_pages_stat_array() out of do_pages_stat(). Signed-off-by: Brice Goglin <Brice.Goglin@...ia.fr> diff --git a/mm/migrate.c b/mm/migrate.c index 1e0d6b2..4350101 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -987,25 +987,18 @@ out: /* * Determine the nodes of an array of pages and store it in an array of status. */ -static int do_pages_stat(struct mm_struct *mm, unsigned long nr_pages, - const void __user * __user *pages, - int __user *status) +static void do_pages_stat_array(struct mm_struct *mm, unsigned long nr_pages, + const void * __user *pages, int *status) { unsigned long i; - int err; down_read(&mm->mmap_sem); - for (i = 0; i < nr_pages; i++) { - const void __user *p; - unsigned long addr; + for (i = 0; i < nr_pages; i++, pages++, status++) { + unsigned long addr = (unsigned long)(*pages); struct vm_area_struct *vma; struct page *page; - - err = -EFAULT; - if (get_user(p, pages+i)) - goto out; - addr = (unsigned long) p; + int err; vma = find_vma(mm, addr); if (!vma) @@ -1024,12 +1017,59 @@ static int do_pages_stat(struct mm_struct *mm, unsigned long nr_pages, err = page_to_nid(page); set_status: - put_user(err, status+i); + *status = err; + } + + up_read(&mm->mmap_sem); +} + +/* + * Determine the nodes of a user array of pages and store it in + * a user array of status. + */ +static int do_pages_stat(struct mm_struct *mm, unsigned long nr_pages, + const void __user * __user *pages, + int __user *status) +{ + const void * __user *chunk_pages; + int *chunk_status; + unsigned long i,chunk_nr; + int err; + + err = -ENOMEM; + chunk_pages = (const void * __user *)__get_free_page(GFP_KERNEL); + if (!chunk_pages) + goto out; + chunk_status = (int *)__get_free_page(GFP_KERNEL); + if (!chunk_status) + goto out_with_chunk_pages; + + chunk_nr = PAGE_SIZE/max(sizeof(*chunk_pages), sizeof(*chunk_status)); + for(i = 0; i < nr_pages; i += chunk_nr, pages += chunk_nr, status += chunk_nr) { + if (chunk_nr + i > nr_pages) + chunk_nr = nr_pages - i; + + err = copy_from_user(chunk_pages, pages, chunk_nr * sizeof(*chunk_pages)); + if (err) { + err = -EFAULT; + goto out_with_chunk_status; + } + + do_pages_stat_array(mm, chunk_nr, chunk_pages, chunk_status); + + err = copy_to_user(status, chunk_status, chunk_nr * sizeof(*chunk_status)); + if (err) { + err = -EFAULT; + goto out_with_chunk_status; + } } err = 0; +out_with_chunk_status: + free_page((unsigned long)chunk_status); +out_with_chunk_pages: + free_page((unsigned long)chunk_pages); out: - up_read(&mm->mmap_sem); return err; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists