lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <d8dd82df0811180606w503563ach8650ab07dcd0a35c@mail.gmail.com>
Date:	Tue, 18 Nov 2008 19:36:27 +0530
From:	"Naval Saini" <navalnovel@...il.com>
To:	linux-kernel@...r.kernel.org
Cc:	naval.saini@....com, navalnovel@...il.com
Subject: O_DIRECT patch for processors with VIPT cache for mainline kernel (specifically arm in our case)

Hi:


We were facing some issues with O_DIRECT, when using XFS (mkfs.xfs) to
format the USB disks. We debugged it to following comment in arm
technical reference manual (which perhaps applies to other
architectures aswell) :-


>From arm 1176 technical ref. manual :-

1. If multiple virtual addresses are mapped onto the same physical
address then for all mappings of bits [13:12] the virtual addresses
must be equal and the same as bits [13:12] of the physical address.
The same physical address can be mapped by TLB entries of different
page sizes, including page sizes over 4KB. Imposing this requirement
on the virtual address is called page coloring.
2. Alternatively, if all mappings to a physical address are of a page
size equal to 4KB, then the restriction that bits [13:12] of the
virtual address must equal bits [13:12] of the physical address is not
necessary. Bits [13:12] of all virtual address aliases must still be
equal.


I am looking forward to help from the community in making this patch a
mainline kernel patch. It need help in reviewing it and fixing a bug
in it (described below).



Our configuration :-

arm 11 architecture / linux-2.6.18.5 / uClibc-0.9.28 .



Proposed Fix :-

Add a mechanism to allocate memory that is page color aligned. Patch
in uclibc and linux kernel.


The idea behind the patch / explanation :-


Modify memalign in uclibc to pass a new flag (MAP_COLORALIGN) to
kernel (via mmap only / no sbrk) ; that directs it to allocate a
memory in which all PAs and VAs are color aligned (see my post above
for why we need it).

The kernel has creates a vm_area for above mmap operation (with
MAP_COLORALIGN flag), and adds VM_COLORALIGN to its vm_area->flags.

When we get an __handle_mm_fault for this area, we have modified some
kernel functions in file mm/page_alloc.c (such as __alloc_pages,
get_page_from_freelist, buffered_rmqueue) to allocate an aligned
memory (as stated above). We have modified the parameter  'order' (in
these functions), to new name 'align_order'. This parameter carries
needed alignment in upper half of the word and order (or size in
pages) in lower half. This change is in effect only when VM_COLORALIGN
is set in flags.

So, finally we allocate the aligned pages in function
__rmqueue_aligned_wrapper . It calls another function
__rmqueue_aligned where it actually does the allocation. In these
functions, we try to allocate the memory from areas (in buddy) of
larger order first (see condition in __rmqueue_aligned_wrapper) and
then from smaller order areas (if former fails). This is done for
optimization considerations.

The extra pages in the front and back of allocated area, are freed in
the function expand_num_pages.

Also i have procured permission from company's legal department to
post the patch to the community.



Bug in the patch :-

I also need help with a bug in the patch. I stumbled across a crash
with a simple program as below, when USB was not connected (ie.
/dev/sda did not have an associated device),

#define ALIGN (16*1024)
#define BS (512)
main ()
{
            posix_memalign ( &b2, ALIGN , BS ); ob2 = b2;
            sda =open ("/dev/sda", O_RDWR ); // get dump here
            return 0;
}


If I modify ALIGN to some lesser value, say 4096, dont get the crash.
Perhaps it is some buddy allocator concept i need to understand, when
i free pages in expand_num_pages (hoping someone can spot it out from
the patch).

Also suggest, how i can do some regression testing here.


The Dump from above program :-


./test_odirect1
Unable to handle kernel NULL pointer dereference at virtual address 00000084
pgd = 97044000
[00000084] *pgd=00000000
Internal error: Oops: 17 [#1]
Modules linked in:
CPU: 0
PC is at do_page_fault+0x20/0x244
LR is at do_DataAbort+0x3c/0xa0
pc : [<9706b748>]    lr : [<9706bb2c>]    Not tainted
sp : 97720058  ip : 176ce431  fp : 9772009c
r10: 00000000  r9 : 40000113  r8 : 97720150
r7 : 00000000  r6 : 9738966c  r5 : 00000017  r4 : 973895fc
r3 : 97720000  r2 : 97720150  r1 : 00000017  r0 : 00000084
Flags: nzcv  IRQs on  FIQs on  Mode SVC_32  Segment user
Control: C5387D  Table: 17720000  DAC: 00000017
Process modprobe (pid: 83, stack limit = 0x9771e250)
Stack: (0x97720058 to 0x97720000)
Backtrace:
[<9706b728>] (do_page_fault+0x0/0x244) from [<9706bb2c>]
(do_DataAbort+0x3c/0xa0)
[<9706baf0>] (do_DataAbort+0x0/0xa0) from [<9706482c>] (__dabt_svc+0x4c/0x60)
 r8 = 97720290  r7 = 00000000  r6 = 9738966C  r5 = 97720184
 r4 = FFFFFFFF
[<9706b728>] (do_page_fault+0x0/0x244) from [<9706bb2c>]
(do_DataAbort+0x3c/0xa0)
[<9706baf0>] (do_DataAbort+0x0/0xa0) from [<9706482c>] (__dabt_svc+0x4c/0x60)
 r8 = 977203D0  r7 = 00000000  r6 = 9738966C  r5 = 977202C4
 r4 = FFFFFFFF
[<9706b728>] (do_page_fault+0x0/0x244) from [<9706bb2c>]
(do_DataAbort+0x3c/0xa0)

........ (so on)




PATCHES 1 -   applies to kernel
--------------------



--- linux-2.6.18.5/arch/arm/mm/mmap.c	2006-12-02 05:43:05.000000000 +0530
+++ linux-2.6.18.5/arch/arm/mm/mmap.c	2008-11-07 19:29:09.000000000 +0530
@@ -41,7 +41,7 @@ arch_get_unmapped_area(struct file *filp
 	if (cache_type != read_cpuid(CPUID_ID)) {
 		aliasing = (cache_type | cache_type >> 12) & (1 << 11);
 		if (aliasing)
-			do_align = filp || flags & MAP_SHARED;
+			do_align = filp || flags & MAP_SHARED || flags & MAP_COLORALIGN;
 	}
 #else
 #define do_align 0
--- linux-2.6.18.5/mm/page_alloc.c	2006-12-02 05:43:05.000000000 +0530
+++ linux-2.6.18.5/mm/page_alloc.c	2008-11-07 19:31:43.000000000 +0530
@@ -519,6 +519,32 @@ static inline void expand(struct zone *z
 	}
 }

+/* The order in while pages are being added to the areas, should not
be changed.
+ * ie. smaller pages (with smaller order) towards the front and
larger towards back.
+ */
+static inline void expand_num_pages(struct zone *zone, struct page
*page, unsigned int size)
+{
+        unsigned int mask, order;
+        struct free_area *area;
+
+	mask = 1;
+	order = 0;
+        while (size) {
+                if ( mask & size )  {
+                        area = zone->free_area + order;
+                        BUG_ON(bad_range(zone, page));
+                        list_add(&page->lru, &area->free_list);
+                        area->nr_free++;
+                        set_page_order(page, order);
+
+			page = &page[mask];
+                        size &= (~mask);
+                }
+		mask <<= 1;
+		++order;
+        }
+}
+
 /*
  * This page is about to be returned from the page allocator
  */
@@ -591,6 +617,110 @@ static struct page *__rmqueue(struct zon
 	return NULL;
 }

+#define log2(n) ffz(~(n))
+
+/*
+ * __rmqueue_aligned :-
+ * Tries to allocate aligned pages of given order, from among pages
of current_order.
+ */
+static struct page *__rmqueue_aligned (struct zone *zone, unsigned
int align_order , unsigned int current_order )
+{
+	struct free_area *area;
+	struct list_head *pos;
+	struct page *page;
+
+	unsigned int frontpad, padsize, lastpad, page_align;
+	const unsigned int extra = (SHMLBA >> PAGE_SHIFT), extra_order = log2(extra);
+	unsigned int order = EXTRACT_ORDER(__GFP_COLORALIGN,align_order);
+	unsigned int align = EXTRACT_ALIGN(__GFP_COLORALIGN,align_order);
+
+	area = zone->free_area + current_order;
+	if (list_empty(&area->free_list))
+		return NULL;
+
+	list_for_each(pos,&area->free_list) {
+		page = list_entry(pos, struct page, lru);
+		page_align = page_to_pfn(page) & (extra-1);
+
+		if ( align == page_align ) { /* color alignment matches */
+			frontpad = 0;
+			break;
+		}
+
+		if ( current_order > order ) /* chk if its not a tightfit; cant
allocate from tight fit */
+		{
+			frontpad = (align > page_align) ? (align - page_align) : (extra +
align - page_align);
+									/* move below at 2 places, for efficiency */
+
+			if ( current_order > extra_order ) /* allocate from inside node */
+				break;
+
+			if ( (unsigned int) ((0x1UL<<current_order) - (0x1UL<<order)) >= frontpad)
+				break;
+		}
+	}
+
+	if ( pos == &area->free_list)  /* is true when we did not find any
suitable match in the area */
+		return NULL;
+
+	list_del(&page->lru);
+	rmv_page_order(page);
+	area->nr_free--;
+	zone->free_pages -= 1UL << order;
+
+	/* Allocation looks as below :-
+	 * [current_order pages] = [frontpad pages] + [order pages] +
[lastpad pages] */
+	padsize = (0x01UL<<order) + frontpad;
+	lastpad = (0x01UL<<current_order) - padsize;
+
+	printk (" current %d = ", 0x1UL<<current_order);
+	printk (" frontpad %d + order %d + lastpad %d \n", frontpad,
0x1UL<<order, lastpad);
+
+	/* we need to remove pages in front and back of allocation */
+	if (lastpad)
+		expand_num_pages (zone, &page[padsize], lastpad);
+	if (frontpad) {
+		expand_num_pages (zone, page, frontpad);
+		page = &page[frontpad];
+		rmv_page_order(page);
+	}
+
+	return page;
+}
+
+/* __rmqueue_aligned_wrapper :-
+ * calls __rmqueue_aligned in a manner, optimzed for speed.
__rmqueue_aligned accepts a parameter
+ * current_order, which is the area we check for free pages. We first
check for large pages thus
+ * spending lesser time looping in __rmqueue_aligned.
+ */
+static struct page *__rmqueue_aligned_wrapper (struct zone *zone,
unsigned int align_order)
+{
+	struct page *page = NULL;
+	unsigned int optim_order, current_order;
+	const unsigned int extra = (SHMLBA >> PAGE_SHIFT), extra_order = log2(extra);
+	unsigned int order = EXTRACT_ORDER(__GFP_COLORALIGN,align_order);
+	unsigned int align = EXTRACT_ALIGN(__GFP_COLORALIGN,align_order);
+
+	BUG_ON(align >= extra);
+
+	/* starting with higher order areas, results in faster find */
+	if ( order == 0 )
+		optim_order = extra_order;
+	else if ( order <= extra_order )
+		optim_order = extra_order + 1;
+	else
+		optim_order = order + 1;
+
+	for (current_order = optim_order; current_order < MAX_ORDER && page
== NULL; current_order++ )
+		page = __rmqueue_aligned ( zone , align_order , current_order );
+
+	for (current_order = optim_order-1; current_order >= order && page
== NULL ; current_order-- )
+		page = __rmqueue_aligned ( zone , align_order , current_order );
+
+	return page;	
+}
+
+
 /*
  * Obtain a specified number of elements from the buddy allocator, all under
  * a single hold of the lock, for efficiency.  Add them to the supplied list.
@@ -773,16 +903,17 @@ void split_page(struct page *page, unsig
  * or two.
  */
 static struct page *buffered_rmqueue(struct zonelist *zonelist,
-			struct zone *zone, int order, gfp_t gfp_flags)
+			struct zone *zone, int align_order, gfp_t gfp_flags)
 {
 	unsigned long flags;
 	struct page *page;
 	int cold = !!(gfp_flags & __GFP_COLD);
 	int cpu;
+	unsigned int order = EXTRACT_ORDER(gfp_flags,align_order);

 again:
 	cpu  = get_cpu();
-	if (likely(order == 0)) {
+	if (likely(order == 0 && !(gfp_flags & __GFP_COLORALIGN))) {
 		struct per_cpu_pages *pcp;

 		pcp = &zone_pcp(zone, cpu)->pcp[cold];
@@ -798,9 +929,13 @@ again:
 		pcp->count--;
 	} else {
 		spin_lock_irqsave(&zone->lock, flags);
-		page = __rmqueue(zone, order);
+		if ( likely(!(gfp_flags & __GFP_COLORALIGN)) ) {
+			page = __rmqueue(zone, order);
+		} else {
+			page = __rmqueue_aligned_wrapper(zone, align_order);
+		}
 		spin_unlock(&zone->lock);
-		if (!page)
+		if (!page)
 			goto failed;
 	}

@@ -864,12 +999,13 @@ int zone_watermark_ok(struct zone *z, in
  * a page.
  */
 static struct page *
-get_page_from_freelist(gfp_t gfp_mask, unsigned int order,
+get_page_from_freelist(gfp_t gfp_mask, unsigned int align_order,
 		struct zonelist *zonelist, int alloc_flags)
 {
 	struct zone **z = zonelist->zones;
 	struct page *page = NULL;
 	int classzone_idx = zone_idx(*z);
+	unsigned int order = EXTRACT_ORDER(gfp_mask,align_order);

 	/*
 	 * Go through the zonelist once, looking for a zone with enough free.
@@ -895,7 +1031,7 @@ get_page_from_freelist(gfp_t gfp_mask, u
 					continue;
 		}

-		page = buffered_rmqueue(zonelist, *z, order, gfp_mask);
+		page = buffered_rmqueue(zonelist, *z, align_order, gfp_mask);
 		if (page) {
 			break;
 		}
@@ -905,9 +1041,13 @@ get_page_from_freelist(gfp_t gfp_mask, u

 /*
  * This is the 'heart' of the zoned buddy allocator.
+ *
+ * VIPT cache fix :-
+ * The upper half (ie. upper 16 bits in case of 32-bit integers) of
align_order
+ * are used for passing, the required page color alignment for the
allocated page.
  */
 struct page * fastcall
-__alloc_pages(gfp_t gfp_mask, unsigned int order,
+__alloc_pages(gfp_t gfp_mask, unsigned int align_order,
 		struct zonelist *zonelist)
 {
 	const gfp_t wait = gfp_mask & __GFP_WAIT;
@@ -918,6 +1058,7 @@ __alloc_pages(gfp_t gfp_mask, unsigned i
 	int do_retry;
 	int alloc_flags;
 	int did_some_progress;
+	unsigned int order = EXTRACT_ORDER(gfp_mask,align_order);

 	might_sleep_if(wait);

@@ -929,7 +1070,7 @@ restart:
 		return NULL;
 	}

-	page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, order,
+	page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, align_order,
 				zonelist, ALLOC_WMARK_LOW|ALLOC_CPUSET);
 	if (page)
 		goto got_pg;
@@ -964,7 +1105,7 @@ restart:
 	 * Ignore cpuset if GFP_ATOMIC (!wait) rather than fail alloc.
 	 * See also cpuset_zone_allowed() comment in kernel/cpuset.c.
 	 */
-	page = get_page_from_freelist(gfp_mask, order, zonelist, alloc_flags);
+	page = get_page_from_freelist(gfp_mask, align_order, zonelist, alloc_flags);
 	if (page)
 		goto got_pg;

@@ -975,7 +1116,7 @@ restart:
 		if (!(gfp_mask & __GFP_NOMEMALLOC)) {
 nofail_alloc:
 			/* go through the zonelist yet again, ignoring mins */
-			page = get_page_from_freelist(gfp_mask, order,
+			page = get_page_from_freelist(gfp_mask, align_order,
 				zonelist, ALLOC_NO_WATERMARKS);
 			if (page)
 				goto got_pg;
@@ -1008,7 +1149,7 @@ rebalance:
 	cond_resched();

 	if (likely(did_some_progress)) {
-		page = get_page_from_freelist(gfp_mask, order,
+		page = get_page_from_freelist(gfp_mask, align_order,
 						zonelist, alloc_flags);
 		if (page)
 			goto got_pg;
@@ -1019,7 +1160,7 @@ rebalance:
 		 * a parallel oom killing, we must fail if we're still
 		 * under heavy pressure.
 		 */
-		page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, order,
+		page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, align_order,
 				zonelist, ALLOC_WMARK_HIGH|ALLOC_CPUSET);
 		if (page)
 			goto got_pg;
--- linux-2.6.18.5/include/asm-arm/mman.h	2006-12-02 05:43:05.000000000 +0530
+++ linux-2.6.18.5/include/asm-arm/mman.h	2008-11-07 19:29:09.000000000 +0530
@@ -11,6 +11,11 @@
 #define MAP_POPULATE	0x8000		/* populate (prefault) page tables */
 #define MAP_NONBLOCK	0x10000		/* do not block on IO */

+#ifdef CONFIG_CPU_V6
+# undef  MAP_COLORALIGN
+# define MAP_COLORALIGN  0x0200	/* For VIVT caches - the alignment
should be color aligned */
+#endif
+
 #define MCL_CURRENT	1		/* lock all current mappings */
 #define MCL_FUTURE	2		/* lock all future mappings */

--- linux-2.6.18.5/include/asm-arm/page.h	2006-12-02 05:43:05.000000000 +0530
+++ linux-2.6.18.5/include/asm-arm/page.h	2008-11-07 19:29:09.000000000 +0530
@@ -134,6 +134,17 @@ extern void __cpu_copy_user_page(void *t
 #define clear_page(page)	memzero((void *)(page), PAGE_SIZE)
 extern void copy_page(void *to, const void *from);

+#ifdef CONFIG_CPU_V6
+#define BITLEN(x) (sizeof(x)<<3) /* for bytes to bits, multiply by 8 */
+#define EXTRACT_ORDER(gfp,num) ( (gfp & __GFP_COLORALIGN) ? (num &
0xFFFF) : (num) )
+#define EXTRACT_ALIGN(gfp,num) ( (gfp & __GFP_COLORALIGN) ?
(num>>BITLEN(unsigned short)) : (0) )
+
+#define ALIGNMENT_BITS(addr) (((addr&(SHMLBA-1))>>PAGE_SHIFT) <<
BITLEN(unsigned short))
+#define ARCH_ALIGNORDER(flags,addr,order) ((flags & VM_COLORALIGN)?
(ALIGNMENT_BITS(addr) | order) : (0))
+
+#define ARCH_ALIGNGFP(flags) ((flags & VM_COLORALIGN)? (__GFP_COLORALIGN):(0))
+#endif
+
 #undef STRICT_MM_TYPECHECKS

 #ifdef STRICT_MM_TYPECHECKS
--- linux-2.6.18.5/include/linux/mman.h	2006-12-02 05:43:05.000000000 +0530
+++ linux-2.6.18.5/include/linux/mman.h	2008-11-07 19:29:09.000000000 +0530
@@ -63,7 +63,8 @@ calc_vm_flag_bits(unsigned long flags)
 	return _calc_vm_trans(flags, MAP_GROWSDOWN,  VM_GROWSDOWN ) |
 	       _calc_vm_trans(flags, MAP_DENYWRITE,  VM_DENYWRITE ) |
 	       _calc_vm_trans(flags, MAP_EXECUTABLE, VM_EXECUTABLE) |
-	       _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    );
+	       _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    ) |
+	       _calc_vm_trans(flags, MAP_COLORALIGN, VM_COLORALIGN);
 }
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MMAN_H */
--- linux-2.6.18.5/include/linux/mm.h	2006-12-02 05:43:05.000000000 +0530
+++ linux-2.6.18.5/include/linux/mm.h	2008-11-07 19:29:09.000000000 +0530
@@ -160,6 +160,7 @@ extern unsigned int kobjsize(const void
 #define VM_DONTEXPAND	0x00040000	/* Cannot expand with mremap() */
 #define VM_RESERVED	0x00080000	/* Count as reserved_vm like IO */
 #define VM_ACCOUNT	0x00100000	/* Is a VM accounted object */
+#define VM_COLORALIGN   0x00200000	/* The vma is aligned with colour
bits for VIVT cache */
 #define VM_HUGETLB	0x00400000	/* Huge TLB Page VM */
 #define VM_NONLINEAR	0x00800000	/* Is non-linear (remap_file_pages) */
 #define VM_MAPPED_COPY	0x01000000	/* T if mapped copy of data (nommu mmap) */
--- linux-2.6.18.5/include/linux/gfp.h	2008-11-07 19:33:55.000000000 +0530
+++ linux-2.6.18.5/include/linux/gfp.h	2008-11-07 19:35:50.000000000 +0530
@@ -47,6 +47,8 @@ struct vm_area_struct;
 #define __GFP_NOMEMALLOC ((__force gfp_t)0x10000u) /* Don't use
emergency reserves */
 #define __GFP_HARDWALL   ((__force gfp_t)0x20000u) /* Enforce
hardwall cpuset memory allocs */

+#define __GFP_COLORALIGN ((__force gfp_t)0x40000u) /* Used by
processors with VIPT caches */
+
 #define __GFP_BITS_SHIFT 20	/* Room for 20 __GFP_FOO bits */
 #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))

@@ -106,8 +108,9 @@ extern struct page *
 FASTCALL(__alloc_pages(gfp_t, unsigned int, struct zonelist *));

 static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
-						unsigned int order)
+						unsigned int align_order)
 {
+	unsigned int order = EXTRACT_ORDER(gfp_mask,align_order);
 	if (unlikely(order >= MAX_ORDER))
 		return NULL;

@@ -115,7 +118,7 @@ static inline struct page *alloc_pages_n
 	if (nid < 0)
 		nid = numa_node_id();

-	return __alloc_pages(gfp_mask, order,
+	return __alloc_pages(gfp_mask, align_order,
 		NODE_DATA(nid)->node_zonelists + gfp_zone(gfp_mask));
 }

@@ -133,9 +136,11 @@ alloc_pages(gfp_t gfp_mask, unsigned int
 extern struct page *alloc_page_vma(gfp_t gfp_mask,
 			struct vm_area_struct *vma, unsigned long addr);
 #else
-#define alloc_pages(gfp_mask, order) \
-		alloc_pages_node(numa_node_id(), gfp_mask, order)
-#define alloc_page_vma(gfp_mask, vma, addr) alloc_pages(gfp_mask, 0)
+
+#define alloc_pages(gfp_mask, align_order) \
+		alloc_pages_node(numa_node_id(), gfp_mask, align_order)
+#define alloc_page_vma(gfp_mask, vma, addr) \
+		alloc_pages(gfp_mask|ARCH_ALIGNGFP(vma->vm_flags),
ARCH_ALIGNORDER(vma->vm_flags,addr,0))
 #endif
 #define alloc_page(gfp_mask) alloc_pages(gfp_mask, 0)

--- linux-2.6.18.5/include/linux/highmem.h	2006-12-02 05:43:05.000000000 +0530
+++ linux-2.6.18.5/include/linux/highmem.h	2008-11-07 20:11:44.000000000 +0530
@@ -67,6 +67,12 @@ alloc_zeroed_user_highpage(struct vm_are
 }
 #endif

+#ifndef ARCH_ALIGNORDER
+#define EXTRACT_ORDER(num) (num)
+#define EXTRACT_ALIGN(num) (0)
+#define ARCH_ALIGNORDER(flag,align,order) (0);
+#endif
+
 static inline void clear_highpage(struct page *page)
 {
 	void *kaddr = kmap_atomic(page, KM_USER0);
--- linux-2.6.18.5/include/asm-generic/mman.h	2006-12-02
05:43:05.000000000 +0530
+++ linux-2.6.18.5/include/asm-generic/mman.h	2008-11-07
19:29:09.000000000 +0530
@@ -20,6 +20,9 @@
 #define MAP_FIXED	0x10		/* Interpret addr exactly */
 #define MAP_ANONYMOUS	0x20		/* don't use a file */

+#define MAP_COLORALIGN  0x0            /* can be redefined (undef) to
0x00200000 in arch specific mmap.h,
+                                           if the processor has VIVT
cache with multiple associativity */
+
 #define MS_ASYNC	1		/* sync memory asynchronously */
 #define MS_INVALIDATE	2		/* invalidate the caches */
 #define MS_SYNC		4		/* synchronous memory sync */




PATCH 2 -- applies to uclibc ( in buildroot directory toolchain/kernel-headers )
-------------------


--- linux/include/asm-arm/mman.h	2008-10-30 16:16:31.000000000 +0530
+++ linux/include/asm-arm/mman.h	2008-10-30 16:16:46.000000000 +0530
@@ -3,6 +3,8 @@

 #include <asm-generic/mman.h>

+#define MAP_COLORALIGN  0x0200      /* For VIVT caches - the
alignment should be color aligned */
+
 #define MAP_GROWSDOWN	0x0100		/* stack-like segment */
 #define MAP_DENYWRITE	0x0800		/* ETXTBSY */
 #define MAP_EXECUTABLE	0x1000		/* mark it as an executable */




PATCH 3  -- applies to uclibc ( in buildroot directory toolchain/uClibc )
---------------


--- uClibc/libc/stdlib/malloc-standard/malloc.c	2008-10-31
13:03:29.000000000 +0530
+++ uClibc/libc/stdlib/malloc-standard/malloc.c	2008-11-03
16:28:16.000000000 +0530
@@ -342,7 +342,15 @@ void __do_check_malloc_state(void)
   space to service request for nb bytes, thus requiring that av->top
   be extended or replaced.
 */
-static void* __malloc_alloc(size_t nb, mstate av)
+
+/* To do :-
+   Check if brk can be used to allocate smaller chunks. Maybe sbrk call checks
+   for vma_flags (such as MAP_COLORALIGN flag) and only allocates/expands
+   areas that use the same time. If such a functionality is not there,
+   perhaps it can be added in future. For now, all MAP_COLORALIGN allocations
+   go through mmap2.
+*/
+static void* __malloc_alloc(size_t nb, mstate av, const int map_flags)
 {
     mchunkptr       old_top;        /* incoming value of av->top */
     size_t old_size;       /* its size */
@@ -374,7 +382,7 @@ static void* __malloc_alloc(size_t nb, m
        than in malloc proper.
        */

-    if (have_fastchunks(av)) {
+    if (have_fastchunks(av) && !(map_flags & MAP_COLORALIGN)) {
 	assert(in_smallbin_range(nb));
 	__malloc_consolidate(av);
 	return malloc(nb - MALLOC_ALIGN_MASK);
@@ -389,7 +397,7 @@ static void* __malloc_alloc(size_t nb, m
        */

     if ((unsigned long)(nb) >= (unsigned long)(av->mmap_threshold) &&
-	    (av->n_mmaps < av->n_mmaps_max)) {
+	    (av->n_mmaps < av->n_mmaps_max) || (map_flags & MAP_COLORALIGN)) {

 	char* mm;             /* return value from mmap call*/

@@ -403,7 +411,7 @@ static void* __malloc_alloc(size_t nb, m
 	/* Don't try if size wraps around 0 */
 	if ((unsigned long)(size) > (unsigned long)(nb)) {

-	    mm = (char*)(MMAP(0, size, PROT_READ|PROT_WRITE));
+	    mm = (char*)(MMAP(0, size, PROT_READ|PROT_WRITE, map_flags));

 	    if (mm != (char*)(MORECORE_FAILURE)) {

@@ -523,7 +531,7 @@ static void* __malloc_alloc(size_t nb, m
 	/* Don't try if size wraps around 0 */
 	if ((unsigned long)(size) > (unsigned long)(nb)) {

-	    fst_brk = (char*)(MMAP(0, size, PROT_READ|PROT_WRITE));
+	    fst_brk = (char*)(MMAP(0, size, PROT_READ|PROT_WRITE, map_flags));

 	    if (fst_brk != (char*)(MORECORE_FAILURE)) {

@@ -802,6 +810,20 @@ static int __malloc_largebin_index(unsig
 /* ------------------------------ malloc ------------------------------ */
 void* malloc(size_t bytes)
 {
+   return __internal_malloc(bytes, 0);
+}
+
+
+/* ---------------------- __internal_malloc ------------------------------ */
+
+/* Why did we add ' map_flags ' to parameter list ?
+   Using map_flags, we can inform the kernel of the following :-
+    - vma is page color aligned (For ex, is needed for using O_DIRECT
+      on a VIVT, multiple assosiative cache).
+   The first user of _internal_malloc function would be memalign.
+*/
+void* __internal_malloc(size_t bytes, const int map_flags)
+{
     mstate av;

     size_t nb;               /* normalized request size */
@@ -851,6 +873,9 @@ void* malloc(size_t bytes)
 	goto use_top;
     }

+    /* for VIVT caches, if we want color alignment, only use __malloc_alloc */
+    if (map_flags & MAP_COLORALIGN) goto use_top;
+
     /*
        If the size qualifies as a fastbin, first check corresponding bin.
        */
@@ -927,7 +952,8 @@ void* malloc(size_t bytes)
 	if (in_smallbin_range(nb) &&
 		bck == unsorted_chunks(av) &&
 		victim == av->last_remainder &&
-		(unsigned long)(size) > (unsigned long)(nb + MINSIZE)) {
+		(unsigned long)(size) > (unsigned long)(nb + MINSIZE))
+                {

 	    /* split and reattach remainder */
 	    remainder_size = size - nb;
@@ -1142,7 +1168,7 @@ use_top:
     victim = av->top;
     size = chunksize(victim);

-    if ((unsigned long)(size) >= (unsigned long)(nb + MINSIZE)) {
+    if ((unsigned long)(size) >= (unsigned long)(nb + MINSIZE) && !map_flags) {
 	remainder_size = size - nb;
 	remainder = chunk_at_offset(victim, nb);
 	av->top = remainder;
@@ -1155,7 +1181,7 @@ use_top:
     }

     /* If no space in top, relay to handle system-dependent cases */
-    sysmem = __malloc_alloc(nb, av);
+    sysmem = __malloc_alloc(nb, av, map_flags);
     retval = sysmem;
 DONE:
     __MALLOC_UNLOCK;
--- uClibc/libc/stdlib/malloc-standard/malloc.h	2008-10-31
13:03:35.000000000 +0530
+++ uClibc/libc/stdlib/malloc-standard/malloc.h	2008-11-03
17:59:57.000000000 +0530
@@ -347,6 +347,7 @@ __UCLIBC_MUTEX_EXTERN(__malloc_lock);
 /* ------------------ MMAP support ------------------  */
 #include <fcntl.h>
 #include <sys/mman.h>
+#include <asm/mman.h>

 #if !defined(MAP_ANONYMOUS) && defined(MAP_ANON)
 #define MAP_ANONYMOUS MAP_ANON
@@ -354,13 +355,13 @@ __UCLIBC_MUTEX_EXTERN(__malloc_lock);

 #ifdef __ARCH_USE_MMU__

-#define MMAP(addr, size, prot) \
- (mmap((addr), (size), (prot), MAP_PRIVATE|MAP_ANONYMOUS, 0, 0))
+#define MMAP(addr, size, prot, map) \
+ (mmap((addr), (size), (prot), MAP_PRIVATE|MAP_ANONYMOUS|map, 0, 0))

 #else

-#define MMAP(addr, size, prot) \
- (mmap((addr), (size), (prot), MAP_SHARED|MAP_ANONYMOUS, 0, 0))
+#define MMAP(addr, size, prot, map) \
+ (mmap((addr), (size), (prot), MAP_SHARED|MAP_ANONYMOUS|map, 0, 0))

 #endif

@@ -931,6 +932,7 @@ extern struct malloc_state __malloc_stat

 /* External internal utilities operating on mstates */
 void   __malloc_consolidate(mstate) attribute_hidden;
+void*  __internal_malloc(size_t bytes, const int map_flags);


 /* Debugging support */
--- uClibc/libc/stdlib/malloc-standard/memalign.c	2008-10-31
13:03:47.000000000 +0530
+++ uClibc/libc/stdlib/malloc-standard/memalign.c	2008-11-03
16:27:09.000000000 +0530
@@ -59,9 +59,11 @@ void* memalign(size_t alignment, size_t
      * request, and then possibly free the leading and trailing space.  */


-    /* Call malloc with worst case padding to hit alignment. */
-
-    m  = (char*)(malloc(nb + alignment + MINSIZE));
+    /* Call __internal_malloc with worst case padding to hit alignment.
+       Note: MAP_COLORALIGN would be disregarded in the kernel if architecture
+       does not require it. For ex, It is needed in case of ARM 11 processors.
+     */
+    m  = (char*)(__internal_malloc(nb + alignment + MINSIZE, MAP_COLORALIGN));

     if (m == 0) {
 	retval = 0; /* propagate failure */





Regards,
Naval
NXP Semiconductors
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ