linux-kernel - Re: [PATCH 04/10] NOMMU: Make VMAs per MM as for MMU-mode linux [ver #3]

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <10270.1231843888@redhat.com>
Date:	Tue, 13 Jan 2009 10:51:28 +0000
From:	David Howells <dhowells@...hat.com>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	dhowells@...hat.com, torvalds@...ux-foundation.org,
	bernds_cb1@...nline.de, rgetz@...ckfin.uclinux.org,
	vapier.adi@...il.com, lethal@...ux-sh.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 04/10] NOMMU: Make VMAs per MM as for MMU-mode linux [ver #3]

Andrew Morton <akpm@...ux-foundation.org> wrote:

> > +		K((unsigned long) atomic_read(&mmap_pages_allocated)),
> 
> This is slightly wrong, as atomic_read() returns "int".  It will show
> weird results on 64-bit nommu machines which allocate more than 2G pages :)

Okay...  I'll make mmap_pages_allocated an atomic_long.  That should be
exactly the same on a 32-bit system.

> > +		   vma->vm_pgoff << PAGE_SHIFT,
> 
> This will overflow at file offsets >4G?

True.  I'll cast it to unsigned long long before shifting and adjust the print
format to %08llx.

> > +	vm_area_cachep = kmem_cache_create("vm_area_struct",
> > +			sizeof(struct vm_area_struct), 0,
> > +			SLAB_PANIC, NULL);
> 
> We have the KMEM_CACHE() macro for this.

Not that fork.c seems to use it.

> (I see this was copy-n-pasted from fork.c.  Why was it moved?)

Because I wanted to do something different for vm_area_struct under nommu for
a time.  I'll move it back.

> open-coded BUG_ON()

All changed.

> `region' could be local to this block, if one feels so inclined.

But then you can't look in the variable with gdb.

> > +#define validate_nommu_regions() do {} while(0)
> 
> There's no need to implement this in cpp...

True.  There's also no need to implement it as a static inline either.

> > +static void free_page_series(unsigned long from, unsigned long to)
> > ...
> > +		atomic_dec(&mmap_pages_allocated);
> 
> This isn't counting "pages allocated".  It's counting page *refcounts*.
> Now, perhaps these pages will always have a refcount of 1 (why?), in
> which case things happen to work out.  But if so, the next line isn't
> needed:

It's actually keeping track of the number of pages we have recorded in
vm_region structs as being allocated.  Once we put the pages here, we can't
track them anymore.

Furthermore, they should have a refcount of 1 at this point, unless there's a
bug in ptrace or direct I/O.

> > +		if (page_count(page) != 1)
> 
> should be "== 1", surely?

Why?  That would be silly.  I want to warn _if_ the page count isn't what I
expect, not if it is.  I'll adjust the text of the warning to make it clear.

> > + * - the caller must hold the region semaphore, which this releases
> 
> "for writing"

Yep.

> > +		printk(KERN_WARNING
> > +		       "munmap of memory not mmapped by process %d (%s):"
> > +		       " 0x%lx-0x%lx\n",
> > +		       current->pid, current->comm, start, start + len - 1);
> 
> Unprivileged userspace can spam logs?  Perhaps not a concern on typical
> nommu systems.  But buggy userspace can spam logs, too, which _is_ a
> problem?

I'll limit it to 5 printks.  But this is NOMMU: if userspace wants to corrupt
the kernel, it can do so at will.


Patch attached.  Now I have to go and catch a plane to Australia.

David
---
From: David Howells <dhowells@...hat.com>
Subject: [PATCH] NOMMU: Fix a number of issues with the per-MM VMA patch

Fix a number of issues with the per-MM VMA patch:

 (1) Make mmap_pages_allocated an atomic_long_t, just in case this is used on
     a NOMMU system with more than 2G pages.  Makes no difference on a 32-bit
     system.

 (2) Report vma->vm_pgoff * PAGE_SIZE as a 64-bit value, not a 32-bit value,
     lest it overflow.

 (3) Move the allocation of the vm_area_struct slab back for fork.c.

 (4) Use KMEM_CACHE() for both vm_area_struct and vm_region slabs.

 (5) Use BUG_ON() rather than if () BUG().

 (6) Make the default validate_nommu_regions() a static inline rather than a
     #define.

 (7) Make free_page_series()'s objection to pages with a refcount != 1 more
     informative.

 (8) Adjust the __put_nommu_region() banner comment to indicate that the
     semaphore must be held for writing.

 (9) Limit the number of warnings about munmaps of non-mmapped regions.

Reported-by: Andrew Morton <akpm@...ux-foundation.org>
Signed-off-by: David Howells <dhowells@...hat.com>
---

 fs/proc/meminfo.c    |    2 +-
 fs/proc/task_nommu.c |    4 ++--
 include/linux/mm.h   |    2 +-
 kernel/fork.c        |    1 +
 mm/mmap.c            |    3 ---
 mm/nommu.c           |   52 ++++++++++++++++++++++++--------------------------
 6 files changed, 30 insertions(+), 34 deletions(-)


diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index 43d2394..74ea974 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -120,7 +120,7 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
 		K(i.freeram-i.freehigh),
 #endif
 #ifndef CONFIG_MMU
-		K((unsigned long) atomic_read(&mmap_pages_allocated)),
+		K((unsigned long) atomic_long_read(&mmap_pages_allocated)),
 #endif
 		K(i.totalswap),
 		K(i.freeswap),
diff --git a/fs/proc/task_nommu.c b/fs/proc/task_nommu.c
index 343ea12..370be0a 100644
--- a/fs/proc/task_nommu.c
+++ b/fs/proc/task_nommu.c
@@ -136,14 +136,14 @@ static int nommu_vma_show(struct seq_file *m, struct vm_area_struct *vma)
 	}
 
 	seq_printf(m,
-		   "%08lx-%08lx %c%c%c%c %08lx %02x:%02x %lu %n",
+		   "%08lx-%08lx %c%c%c%c %08llx %02x:%02x %lu %n",
 		   vma->vm_start,
 		   vma->vm_end,
 		   flags & VM_READ ? 'r' : '-',
 		   flags & VM_WRITE ? 'w' : '-',
 		   flags & VM_EXEC ? 'x' : '-',
 		   flags & VM_MAYSHARE ? flags & VM_SHARED ? 'S' : 's' : 'p',
-		   vma->vm_pgoff << PAGE_SHIFT,
+		   (unsigned long long) vma->vm_pgoff << PAGE_SHIFT,
 		   MAJOR(dev), MINOR(dev), ino, &len);
 
 	if (file) {
diff --git a/include/linux/mm.h b/include/linux/mm.h
index b91a73f..d9a832c 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1064,7 +1064,7 @@ static inline void setup_per_cpu_pageset(void) {}
 #endif
 
 /* nommu.c */
-extern atomic_t mmap_pages_allocated;
+extern atomic_long_t mmap_pages_allocated;
 
 /* prio_tree.c */
 void vma_prio_tree_add(struct vm_area_struct *, struct vm_area_struct *old);
diff --git a/kernel/fork.c b/kernel/fork.c
index 1d68f12..119b6ed 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1484,6 +1484,7 @@ void __init proc_caches_init(void)
 	mm_cachep = kmem_cache_create("mm_struct",
 			sizeof(struct mm_struct), ARCH_MIN_MMSTRUCT_ALIGN,
 			SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL);
+	vm_area_cachep = KMEM_CACHE(vm_area_struct, SLAB_PANIC);
 	mmap_init();
 }
 
diff --git a/mm/mmap.c b/mm/mmap.c
index 7496231..46d126c 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2478,7 +2478,4 @@ void mm_drop_all_locks(struct mm_struct *mm)
  */
 void __init mmap_init(void)
 {
-	vm_area_cachep = kmem_cache_create("vm_area_struct",
-			sizeof(struct vm_area_struct), 0,
-			SLAB_PANIC, NULL);
 }
diff --git a/mm/nommu.c b/mm/nommu.c
index 60ed837..427d8c4 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -69,7 +69,7 @@ int sysctl_max_map_count = DEFAULT_MAX_MAP_COUNT;
 int sysctl_nr_trim_pages = 1; /* page trimming behaviour */
 int heap_stack_gap = 0;
 
-atomic_t mmap_pages_allocated;
+atomic_long_t mmap_pages_allocated;
 
 EXPORT_SYMBOL(mem_map);
 EXPORT_SYMBOL(num_physpages);
@@ -445,12 +445,7 @@ asmlinkage unsigned long sys_brk(unsigned long brk)
  */
 void __init mmap_init(void)
 {
-	vm_region_jar = kmem_cache_create("vm_region_jar",
-					  sizeof(struct vm_region), 0,
-					  SLAB_PANIC, NULL);
-	vm_area_cachep = kmem_cache_create("vm_area_struct",
-					   sizeof(struct vm_area_struct), 0,
-					   SLAB_PANIC, NULL);
+	vm_region_jar = KMEM_CACHE(vm_region, SLAB_PANIC);
 }
 
 /*
@@ -468,27 +463,24 @@ static noinline void validate_nommu_regions(void)
 		return;
 
 	last = rb_entry(lastp, struct vm_region, vm_rb);
-	if (unlikely(last->vm_end <= last->vm_start))
-		BUG();
-	if (unlikely(last->vm_top < last->vm_end))
-		BUG();
+	BUG_ON(unlikely(last->vm_end <= last->vm_start));
+	BUG_ON(unlikely(last->vm_top < last->vm_end));
 
 	while ((p = rb_next(lastp))) {
 		region = rb_entry(p, struct vm_region, vm_rb);
 		last = rb_entry(lastp, struct vm_region, vm_rb);
 
-		if (unlikely(region->vm_end <= region->vm_start))
-			BUG();
-		if (unlikely(region->vm_top < region->vm_end))
-			BUG();
-		if (unlikely(region->vm_start < last->vm_top))
-			BUG();
+		BUG_ON(unlikely(region->vm_end <= region->vm_start));
+		BUG_ON(unlikely(region->vm_top < region->vm_end));
+		BUG_ON(unlikely(region->vm_start < last->vm_top));
 
 		lastp = p;
 	}
 }
 #else
-#define validate_nommu_regions() do {} while(0)
+static void validate_nommu_regions(void)
+{
+}
 #endif
 
 /*
@@ -545,16 +537,17 @@ static void free_page_series(unsigned long from, unsigned long to)
 		struct page *page = virt_to_page(from);
 
 		kdebug("- free %lx", from);
-		atomic_dec(&mmap_pages_allocated);
+		atomic_long_dec(&mmap_pages_allocated);
 		if (page_count(page) != 1)
-			kdebug("free page %p [%d]", page, page_count(page));
+			kdebug("free page %p: refcount not one: %d",
+			       page, page_count(page));
 		put_page(page);
 	}
 }
 
 /*
  * release a reference to a region
- * - the caller must hold the region semaphore, which this releases
+ * - the caller must hold the region semaphore for writing, which this releases
  * - the region may not have been added to the tree yet, in which case vm_top
  *   will equal vm_start
  */
@@ -1078,7 +1071,7 @@ static int do_mmap_private(struct vm_area_struct *vma,
 		goto enomem;
 
 	total = 1 << order;
-	atomic_add(total, &mmap_pages_allocated);
+	atomic_long_add(total, &mmap_pages_allocated);
 
 	point = rlen >> PAGE_SHIFT;
 
@@ -1089,7 +1082,7 @@ static int do_mmap_private(struct vm_area_struct *vma,
 			order = ilog2(total - point);
 			n = 1 << order;
 			kdebug("shave %lu/%lu @%lu", n, total - point, total);
-			atomic_sub(n, &mmap_pages_allocated);
+			atomic_long_sub(n, &mmap_pages_allocated);
 			total -= n;
 			set_page_refcounted(pages + total);
 			__free_pages(pages + total, order);
@@ -1518,10 +1511,15 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len)
 	/* find the first potentially overlapping VMA */
 	vma = find_vma(mm, start);
 	if (!vma) {
-		printk(KERN_WARNING
-		       "munmap of memory not mmapped by process %d (%s):"
-		       " 0x%lx-0x%lx\n",
-		       current->pid, current->comm, start, start + len - 1);
+		static int limit = 0;
+		if (limit < 5) {
+			printk(KERN_WARNING
+			       "munmap of memory not mmapped by process %d"
+			       " (%s): 0x%lx-0x%lx\n",
+			       current->pid, current->comm,
+			       start, start + len - 1);
+			limit++;
+		}
 		return -EINVAL;
 	}
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/