linux-kernel - Re: [PATCH 0/4] big chunk memory allocator v4

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 23 Nov 2010 16:46:03 +0100
From:	Michał Nazarewicz <m.nazarewicz@...sung.com>
To:	Andrew Morton <akpm@...ux-foundation.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
Cc:	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	minchan.kim@...il.com, Bob Liu <lliubbo@...il.com>,
	fujita.tomonori@....ntt.co.jp, pawel@...iak.com,
	andi.kleen@...el.com, felipe.contreras@...il.com,
	"kosaki.motohiro@...fujitsu.com" <kosaki.motohiro@...fujitsu.com>,
	Marek Szyprowski <m.szyprowski@...sung.com>
Subject: Re: [PATCH 0/4] big chunk memory allocator v4

On Mon, 22 Nov 2010 01:04:31 +0100, KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com> wrote:

> On Fri, 19 Nov 2010 12:56:53 -0800
> Andrew Morton <akpm@...ux-foundation.org> wrote:
>
>> On Fri, 19 Nov 2010 17:10:33 +0900
>> KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com> wrote:
>>
>> > Hi, this is an updated version.
>> >
>> > No major changes from the last one except for page allocation function.
>> > removed RFC.
>> >
>> > Order of patches is
>> >
>> > [1/4] move some functions from memory_hotplug.c to page_isolation.c
>> > [2/4] search physically contiguous range suitable for big chunk alloc.
>> > [3/4] allocate big chunk memory based on memory hotplug(migration) technique
>> > [4/4] modify page allocation function.
>> >
>> > For what:
>> >
>> >   I hear there is requirements to allocate a chunk of page which is larger than
>> >   MAX_ORDER. Now, some (embeded) device use a big memory chunk. To use memory,
>> >   they hide some memory range by boot option (mem=) and use hidden memory
>> >   for its own purpose. But this seems a lack of feature in memory management.
>> >
>> >   This patch adds
>> > 	alloc_contig_pages(start, end, nr_pages, gfp_mask)
>> >   to allocate a chunk of page whose length is nr_pages from [start, end)
>> >   phys address. This uses similar logic of memory-unplug, which tries to
>> >   offline [start, end) pages. By this, drivers can allocate 30M or 128M or
>> >   much bigger memory chunk on demand. (I allocated 1G chunk in my test).
>> >
>> >   But yes, because of fragmentation, this cannot guarantee 100% alloc.
>> >   If alloc_contig_pages() is called in system boot up or movable_zone is used,
>> >   this allocation succeeds at high rate.
>>
>> So this is an alternatve implementation for the functionality offered
>> by Michal's "The Contiguous Memory Allocator framework".
>>
>
> Yes, this will be a backends for that kind of works.

As a matter of fact CMA's v6 tries to use code "borrowed" from the alloc_contig_pages()
patches.

The most important difference is that alloc_contig_pages() would look for a chunk
of memory that can be allocated and then perform migration whereas CMA assumes that
regions it controls are always "migratable".

Also, I've tried to remove the requirement for MAX_ORDER alignment.

> I think there are two ways to allocate contiguous pages larger than MAX_ORDER.
>
> 1) hide some memory at boot and add an another memory allocator.
> 2) support a range allocator as [start, end)
>
> This is an trial from 2). I used memory-hotplug technique because I know some.
> This patch itself has no "map" and "management" function, so it should be
> developped in another patch (but maybe it will be not my work.)

Yes, this is also a valid point.  From my use cases, the alloc_contig_pages()
would probably not be enough and require some management code to be added.

>> >   I tested this on x86-64, and it seems to work as expected. But feedback from
>> >   embeded guys are appreciated because I think they are main user of this
>> >   function.
>>
>> From where I sit, feedback from the embedded guys is *vital*, because
>> they are indeed the main users.
>>
>> Michal, I haven't made a note of all the people who are interested in
>> and who are potential users of this code.  Your patch series has a
>> billion cc's and is up to version 6.

Ah, yes...  I was thinking about shrinking the cc list but didn't want to
seem rude or anything removing ppl who have shown interest in the previous
posted version.

>> Could I ask that you review and
>> test this code, and also hunt down other people (probably at other
>> organisations) who can do likewise for us?  Because until we hear from
>> those people that this work satisfies their needs, we can't really
>> proceed much further.

A few things than:

1. As Felipe mentioned, on ARM it is often desired to have the memory
    mapped as non-cacheable, which most often mean that the memory never
    reaches the page allocator.  This means, that alloc_contig_pages()
    would not be suitable for cases where one needs such memory.

    Or could this be overcome by adding the memory back as highmem?  But
    then, it would force to compile in highmem support even if platform
    does not really need it.

2. Device drivers should not by themselves know what ranges of memory to
    allocate memory from.  Moreover, some device drivers could require
    allocation different buffers from different ranges.  As such, this
    would require some management code on top of alloc_contig_pages().

3. When posting hwmem, Johan Mossberg mentioned that he'd like to see
    notion of "pinning" chunks (so that not-pinned chunks can be moved
    around when hardware does not use them to defragment memory).  This
    would again require some management code on top of
    alloc_contig_pages().

4. I might be mistaken here, but the way I understand ZONE_MOVABLE work
    is that it is cut of from the end of memory.  Or am I talking nonsense?
    My concern is that at least one chip I'm working with requires
    allocations from different memory banks which would basically mean that
    there would have to be two movable zones, ie:

    +-------------------+-------------------+
    | Memory Bank #1    | Memory Bank #2    |
    +---------+---------+---------+---------+
    | normal  | movable | normal  | movable |
    +---------+---------+---------+---------+

So even though I'm personally somehow drawn by alloc_contig_pages()'s
simplicity (compared to CMA at least), those quick thoughts make me think
that alloc_contig_pages() would work rather as a backend (as Kamezawa
mentioned) for some, maybe even tiny but still present, management code
which would handle "marking memory fragments as ZONE_MOVABLE" (whatever
that would involve) and deciding which memory ranges drivers can allocate
from.

I'm also wondering whether alloc_contig_pages()'s first-fit is suitable but
that probably cannot be judged without some benchmarks.

-- 
Best regards,                                        _     _
| Humble Liege of Serenely Enlightened Majesty of  o' \,=./ `o
| Computer Science,  Michał "mina86" Nazarewicz       (o o)
+----[mina86*mina86.com]---[mina86*jabber.org]----ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/