lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1426248777-19768-4-git-send-email-r.peniaev@gmail.com>
Date:	Fri, 13 Mar 2015 21:12:57 +0900
From:	Roman Pen <r.peniaev@...il.com>
To:	unlisted-recipients:; (no To-header on input)
Cc:	Roman Pen <r.peniaev@...il.com>, Nick Piggin <npiggin@...nel.dk>,
	Zhang Yanfei <zhangyanfei@...fujitsu.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Eric Dumazet <edumazet@...gle.com>,
	Joonsoo Kim <iamjoonsoo.kim@....com>,
	David Rientjes <rientjes@...gle.com>,
	WANG Chao <chaowang@...hat.com>,
	Fabian Frederick <fabf@...net.be>,
	Christoph Lameter <cl@...ux.com>, Gioh Kim <gioh.kim@....com>,
	Rob Jones <rob.jones@...ethink.co.uk>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org
Subject: [PATCH 3/3] mm/vmalloc: get rid of dirty bitmap inside vmap_block structure

In original implementation of vm_map_ram made by Nick Piggin there were two
bitmaps:  alloc_map and dirty_map.  None of them were used as supposed to be:
finding a suitable free hole for next allocation in block. vm_map_ram allocates
space sequentially in block and on free call marks pages as dirty, so freed
space can't be reused anymore.

Actually would be very interesting to know the real meaning of those bitmaps,
maybe implementation was incomplete, etc.

But long time ago Zhang Yanfei removed alloc_map by these two commits:

  mm/vmalloc.c: remove dead code in vb_alloc
     3fcd76e8028e0be37b02a2002b4f56755daeda06
  mm/vmalloc.c: remove alloc_map from vmap_block
     b8e748b6c32999f221ea4786557b8e7e6c4e4e7a

In current patch I replaced dirty_map with two range variables: dirty min and
max.  These variables store minimum and maximum position of dirty space in a
block, since we need only to know the dirty range, not exact position of dirty
pages.

Why it was made? Several reasons: at first glance it seems that vm_map_ram
allocator concerns about fragmentation thus it uses bitmaps for finding free
hole, but it is not true.  To avoid complexity seems it is better to use
something simple, like min or max range values.  Secondly, code also becomes
simpler, without iteration over bitmap, just comparing values in min and max
macros.  Thirdly, bitmap occupies up to 1024 bits (4MB is a max size of a
block).  Here I replaced the whole bitmap with two longs.

Finally vm_unmap_aliases should be slightly faster and the whole vmap_block
structure occupies less memory.

Signed-off-by: Roman Pen <r.peniaev@...il.com>
Cc: Nick Piggin <npiggin@...nel.dk>
Cc: Zhang Yanfei <zhangyanfei@...fujitsu.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>
Cc: Eric Dumazet <edumazet@...gle.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@....com>
Cc: David Rientjes <rientjes@...gle.com>
Cc: WANG Chao <chaowang@...hat.com>
Cc: Fabian Frederick <fabf@...net.be>
Cc: Christoph Lameter <cl@...ux.com>
Cc: Gioh Kim <gioh.kim@....com>
Cc: Rob Jones <rob.jones@...ethink.co.uk>
Cc: linux-mm@...ck.org
Cc: linux-kernel@...r.kernel.org
---
 mm/vmalloc.c | 35 +++++++++++++++++------------------
 1 file changed, 17 insertions(+), 18 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 9759c92..77c4b8a 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -760,7 +760,7 @@ struct vmap_block {
 	spinlock_t lock;
 	struct vmap_area *va;
 	unsigned long free, dirty;
-	DECLARE_BITMAP(dirty_map, VMAP_BBMAP_BITS);
+	unsigned long dirty_min, dirty_max; /*< dirty range */
 	struct list_head free_list;
 	struct rcu_head rcu_head;
 	struct list_head purge;
@@ -845,7 +845,8 @@ static void *new_vmap_block(unsigned int order, gfp_t gfp_mask)
 	BUG_ON(VMAP_BBMAP_BITS <= (1UL << order));
 	vb->free = VMAP_BBMAP_BITS - (1UL << order);
 	vb->dirty = 0;
-	bitmap_zero(vb->dirty_map, VMAP_BBMAP_BITS);
+	vb->dirty_min = VMAP_BBMAP_BITS;
+	vb->dirty_max = 0;
 	INIT_LIST_HEAD(&vb->free_list);
 
 	vb_idx = addr_to_vb_idx(va->va_start);
@@ -896,7 +897,8 @@ static void purge_fragmented_blocks(int cpu)
 		if (vb->free + vb->dirty == VMAP_BBMAP_BITS && vb->dirty != VMAP_BBMAP_BITS) {
 			vb->free = 0; /* prevent further allocs after releasing lock */
 			vb->dirty = VMAP_BBMAP_BITS; /* prevent purging it again */
-			bitmap_fill(vb->dirty_map, VMAP_BBMAP_BITS);
+			vb->dirty_min = 0;
+			vb->dirty_max = VMAP_BBMAP_BITS;
 			spin_lock(&vbq->lock);
 			list_del_rcu(&vb->free_list);
 			spin_unlock(&vbq->lock);
@@ -989,6 +991,7 @@ static void vb_free(const void *addr, unsigned long size)
 	order = get_order(size);
 
 	offset = (unsigned long)addr & (VMAP_BLOCK_SIZE - 1);
+	offset >>= PAGE_SHIFT;
 
 	vb_idx = addr_to_vb_idx((unsigned long)addr);
 	rcu_read_lock();
@@ -999,7 +1002,10 @@ static void vb_free(const void *addr, unsigned long size)
 	vunmap_page_range((unsigned long)addr, (unsigned long)addr + size);
 
 	spin_lock(&vb->lock);
-	BUG_ON(bitmap_allocate_region(vb->dirty_map, offset >> PAGE_SHIFT, order));
+
+	/* Expand dirty range */
+	vb->dirty_min = min(vb->dirty_min, offset);
+	vb->dirty_max = max(vb->dirty_max, offset + (1UL << order));
 
 	vb->dirty += 1UL << order;
 	if (vb->dirty == VMAP_BBMAP_BITS) {
@@ -1038,25 +1044,18 @@ void vm_unmap_aliases(void)
 
 		rcu_read_lock();
 		list_for_each_entry_rcu(vb, &vbq->free, free_list) {
-			int i, j;
-
 			spin_lock(&vb->lock);
-			i = find_first_bit(vb->dirty_map, VMAP_BBMAP_BITS);
-			if (i < VMAP_BBMAP_BITS) {
+			if (vb->dirty) {
+				unsigned long va_start = vb->va->va_start;
 				unsigned long s, e;
 
-				j = find_last_bit(vb->dirty_map,
-							VMAP_BBMAP_BITS);
-				j = j + 1; /* need exclusive index */
+				s = va_start + (vb->dirty_min << PAGE_SHIFT);
+				e = va_start + (vb->dirty_max << PAGE_SHIFT);
 
-				s = vb->va->va_start + (i << PAGE_SHIFT);
-				e = vb->va->va_start + (j << PAGE_SHIFT);
-				flush = 1;
+				start = min(s, start);
+				end   = max(e, end);
 
-				if (s < start)
-					start = s;
-				if (e > end)
-					end = e;
+				flush = 1;
 			}
 			spin_unlock(&vb->lock);
 		}
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ