lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sat, 30 Jan 2016 18:21:38 +0800
From:	Xishi Qiu <qiuxishi@...wei.com>
To:	Vlastimil Babka <vbabka@...e.cz>
CC:	ChengYi He <chengyihetaipei@...il.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Mel Gorman <mgorman@...hsingularity.net>,
	"Michal Hocko" <mhocko@...e.com>,
	David Rientjes <rientjes@...gle.com>,
	Joonsoo Kim <js1304@...il.com>,
	Yaowei Bai <bywxiaobai@....com>,
	Alexander Duyck <alexander.h.duyck@...hat.com>,
	"'Kirill A . Shutemov'" <kirill.shutemov@...ux.intel.com>,
	Johannes Weiner <hannes@...xchg.org>, <linux-mm@...ck.org>,
	<linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH 0/2] avoid external fragmentation related to migration
 fallback

On 2016/1/30 5:03, Vlastimil Babka wrote:

> On 01/29/2016 08:23 PM, ChengYi He wrote:
> 
> [...]
> 
>> Below is the root cause of this external fragmentation which could be
>> observed in devices which have only one memory zone, such as some arm64
>> android devices:
>>
>> 1) In arm64, the first 4GB physical address space is of DMA zone. If the
>> size of physical memory is less than 4GB and the whole memory is in the
>> first 4GB address space, then the system will have only one DMA zone.
>> 2) By default, all pageblocks are Movable.
>> 3) Allocators such as slab, ion, graphics preferably allocate pages of
>> Unmvoable migration type. It might fallback to allocate Movable pages
>> and changes Movable pageblocks into Unmovable ones.
>> 4) Movable pagesblocks will become less and less due to above reason.
>> However, in android system, AnonPages request is always high. The
>> Movable pages will be easily exhausted.
>> 5) While Movable pages are exhausted, the Movable allocations will
>> frequently fallback to allocate the largest feasiable pages of the other
>> migration types. The order-2 and order-3 Unmovable pages will be split
>> into smaller ones easily.
>>
>> This symptom doesn't appear in arm32 android which usually has two
>> memory zones including Highmem and Normal. The slab, ion, and graphics
>> allocators allocate pages with flag GFP_KERNEL. Only Movable pageblocks
>> in Normal zone become less, and the Movable pages in Highmem zone are
>> still a lot. Thus, the Movable pages will not be easily exhausted, and
>> there will not be frequent fallbacks.
> 
> Hm, this 1 zone vs 2 zones shouldn't make that much difference, unless
> a) you use zone reclaim mode, or b) you have an old kernel without fair
> zone allocation policy?
> 

Hi Vlastimil,

I agree with you.

I think if we have a normal zone and a movable zone, then the effect of
compaction will be better.

e.g.
U: unmovable page
M: movable page
F: free page

one zone(DMA):
paddr:         0                       max
ZONE_DMA:      U M F U M F ... U M F U M F
after compact: U F F U F F ... U M M U M M
 
two zone(DMA and MOVABLE)
paddr:         0                       max
ZONE_DMA: the same as above
ZONE_MOVABLE:  M F M F M F ... M F M F M F
after compact: F F F F F F ... M M M M M M  // we get large block than above


>> Since the root cause is that fallbacks might frequently split order-2
>> and order-3 pages of the other migration types. This patch tweaks
>> fallback mechanism to avoid splitting order-2 and order-3 pages. while
>> fallbacks happen, if the largest feasible pages are less than or queal to
>> COSTLY_ORDER, i.e. 3, then try to select the smallest feasible pages. The
>> reason why fallbacks prefer the largest feasiable pages is to increase
>> fallback efficiency since fallbacks are likely to happen again. By
>> stealing the largest feasible pages, it could reduce the oppourtunities
>> of antoher fallback. Besides, it could make consecutive allocations more
>> approximate to each other and make system less fragment. However, if the
>> largest feasible pages are less than or equal to order-3, fallbacks might
>> split it and make the upcoming order-3 page allocations fail.
> 
> In theory I don't see immediately why preferring smaller pages for
> fallback should be a clear win. If it's Unmovable allocations stealing
> from Movable pageblocks, the allocations will spread over larger areas
> instead of being grouped together. Maybe, for Movable allocations
> stealing from Unmovable allocations, preferring smallest might make
> sense and be safe, as any extra fragmentation is fixable bycompaction.
> Maybe it was already tried (by Joonsoo?) at some point, I'm not sure
> right now.
> 
>> My test is against arm64 android devices with kernel 3.10.49. I set the
>> same account and install the same applications in both deivces and use
>> them synchronously.
> 
> 3.10 is wayyyyyy old. There were numerous patches to compaction and
> anti-fragmentation since then. IIRC the fallback decisions were quite
> suboptimal at that point. I'm not even sure how you could apply your
> patches to both recent kernel for posting them, and 3.10 for testing?
> Is it possible to test on 4.4?
> 

I think it's hard to update the drivers on android smart phone.

Thanks,
Xishi Qiu

>>
>> Test result:
>> 1) Test without this patch:
>> Most free pages are order-0 Unmovable ones. allocstall and compact_stall
>> in /proc/vmstat are relatively high. And most occurances of allocstall
>> are due to order-2 and order-3 allocations.
>> 2) Test with this patch:
>> There are more order-2 and order-3 free pages. allocstall and
>> compact_stall in /proc/vmstat are relatively low. And most occurances of
>> allocstall are due to order-0 allocations.
>>
>> Log:
>> 1) Test without this patch:
>> ------ TIME (date) ------
>> Fri Jul  3 16:52:55 CST 2015
>> ------ UPTIME (uptime) ------
>> up time: 2 days, 12:06:52, idle time: 8 days, 14:48:55, sleep time: 16:43:56
>> ------ MEMORY INFO (/proc/meminfo) ------
>> MemTotal:        2792568 kB
>> MemFree:          194524 kB
>> Buffers:            3788 kB
>> Cached:           380872 kB
>> ------ PAGETYPEINFO (/proc/pagetypeinfo) ------
>> Free pages count per migrate type at order      0     1     2    3    4
>> Node    0, zone      DMA, type    Unmovable 43852   701     0    0    0
>> Node    0, zone      DMA, type  Reclaimable  3357     0     0    0    0
>> Node    0, zone      DMA, type      Movable     0     5     0    0    0
>> Node    0, zone      DMA, type      Reserve     0     1     5    0    0
>> Node    0, zone      DMA, type          CMA     2     0     0    0    0
>> Node    0, zone      DMA, type      Isolate     0     0     0    0    0
>> Number of blocks type Unmovable Reclaimable Movable Reserve CMA Isolate
>> Node 0, zone      DMA       362          80     170       2 113       0
>> ------ VIRTUAL MEMORY STATS (/proc/vmstat) ------
>> pgsteal_kswapd_dma 31755040
>> pgsteal_direct_dma 34597394
>> pgscan_kswapd_dma 36427664
>> pgscan_direct_dma 39490711
>> kswapd_low_wmark_hit_quickly 201929
>> kswapd_high_wmark_hit_quickly 4858
>> allocstall 664269
>> allocstall_order_0 9738
>> allocstall_order_1 1787
>> allocstall_order_2 637608
>> allocstall_order_3 15136
>> pgmigrate_success 2941956
>> pgmigrate_fail 1033
>> compact_migrate_scanned 142985157
>> compact_free_scanned 4734040109
>> compact_isolated 7720362
>> compact_stall 65978
>> compact_fail 46084
>> compact_success 11717
>>
>> 2) Test with this patch:
>> ------ TIME (date) ------
>> Fri Jul  3 16:52:31 CST 2015
>> ------ UPTIME (uptime) ------
>> up time: 2 days, 12:06:30
>> ------ MEMORY INFO (/proc/meminfo) ------
>> MemTotal:        2792568 kB
>> MemFree:           47612 kB
>> Buffers:            3732 kB
>> Cached:           387048 kB
>> ------ PAGETYPEINFO (/proc/pagetypeinfo) ------
>> Free pages count per migrate type at order      0     1     2    3    4
>> Node    0, zone      DMA, type    Unmovable   272   243   126    1    0
>> Node    0, zone      DMA, type  Reclaimable     0   361   168   46    0
>> Node    0, zone      DMA, type      Movable  4103  1782   130    3    0
>> Node    0, zone      DMA, type      Reserve     0     0     0    0    0
>> Node    0, zone      DMA, type          CMA   563     2     0    0    0
>> Node    0, zone      DMA, type      Isolate     0     0     0    0    0
>> Number of blocks type Unmovable Reclaimable Movable Reserve CMA Isolate
>> Node 0, zone      DMA       183          12     417       2 113       0
>> ------ VIRTUAL MEMORY STATS (/proc/vmstat) ------
>> pgsteal_kswapd_dma 50710868
>> pgsteal_direct_dma 1756780
>> pgscan_kswapd_dma 58281837
>> pgscan_direct_dma 2022049
>> kswapd_low_wmark_hit_quickly 37599
>> kswapd_high_wmark_hit_quickly 13564
>> allocstall 27510
>> allocstall_order_0 26101
>> allocstall_order_1 23
>> allocstall_order_2 1224
>> allocstall_order_3 162
>> pgmigrate_success 63751
>> pgmigrate_fail 7
>> compact_migrate_scanned 278170
>> compact_free_scanned 6155410
>> compact_isolated 140762
>> compact_stall 749
>> compact_fail 54
>> compact_success 22
>> unevictable_pgs_culled 794
>>
>> Below is the status of another device with this patch.
>> /proc/pagetypeinfo shows that even if there are no Movable pages, there
>> are lots of order-2 and order-3 Unmovable pages. For this case, if the
>> patch is not applied, then order-2 and order-3 Unmovable pages will be
>> split easily. It's likely that system perforamnce will become low due to
>> severe external fragmentation.
>>
>> ------ UPTIME (uptime) ------
>> up time: 33 days, 08:10:58
>> ------ MEMORY INFO (/proc/meminfo) ------
>> MemTotal:        2792568 kB
>> MemFree:           37340 kB
>> Buffers:           13412 kB
>> Cached:           655456 kB
>> ------ PAGETYPEINFO (/proc/pagetypeinfo) ------
>> Free pages count per migrate type at order      0     1     2     3    4
>> Node    0, zone      DMA, type    Unmovable   718   628  1116   301    0
>> Node    0, zone      DMA, type  Reclaimable   198    93     0     0    0
>> Node    0, zone      DMA, type      Movable     0     0     0     0    0
>> Node    0, zone      DMA, type      Reserve     0     0     0     0    0
>> Node    0, zone      DMA, type          CMA    89    11     3     0    0
>> Node    0, zone      DMA, type      Isolate     0     0     0     0    0
>> Number of blocks type Unmovable Reclaimable Movable Reserve  CMA Isolate
>> Node 0, zone      DMA       377         115     120       2  113       0
>> ------ VIRTUAL MEMORY STATS (/proc/vmstat) ------
>> pgsteal_direct_dma 28575192
>> pgsteal_kswapd_dma 378357910
>> pgscan_kswapd_dma 422765699
>> pgscan_direct_dma 31860747
>> kswapd_low_wmark_hit_quickly 947979
>> kswapd_high_wmark_hit_quickly 139901
>> allocstall 592989
>> compact_migrate_scanned 149884903
>> compact_free_scanned 6629299888
>> compact_isolated 7699012
>> compact_stall 52550
>> compact_fail 45155
>> compact_success 6057
>>
>> ChengYi He (2):
>>   mm/page_alloc: let migration fallback support pages of requested order
>>   mm/page_alloc: avoid splitting pages of order 2 and 3 in migration
>>     fallback
>>
>>  mm/page_alloc.c | 92 ++++++++++++++++++++++++++++++++++++---------------------
>>  1 file changed, 59 insertions(+), 33 deletions(-)
>>
> 
> 
> .
> 



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ