lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150428074540.GA18647@js1304-P5Q-DELUXE>
Date:	Tue, 28 Apr 2015 16:45:40 +0900
From:	Joonsoo Kim <iamjoonsoo.kim@....com>
To:	Mel Gorman <mgorman@...e.de>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org,
	Vlastimil Babka <vbabka@...e.cz>,
	Johannes Weiner <hannes@...xchg.org>,
	Rik van Riel <riel@...hat.com>
Subject: Re: [RFC PATCH 3/3] mm: support active anti-fragmentation algorithm

On Mon, Apr 27, 2015 at 09:29:23AM +0100, Mel Gorman wrote:
> On Mon, Apr 27, 2015 at 04:23:41PM +0900, Joonsoo Kim wrote:
> > We already have antifragmentation policy in page allocator. It works well
> > when system memory is sufficient, but, it doesn't works well when system
> > memory isn't sufficient because memory is already highly fragmented and
> > fallback/steal mechanism cannot get whole pageblock. If there is severe
> > unmovable allocation requestor like zram, problem could get worse.
> > 
> > CPU: 8
> > RAM: 512 MB with zram swap
> > WORKLOAD: kernel build with -j12
> > OPTION: page owner is enabled to measure fragmentation
> > After finishing the build, check fragmentation by 'cat /proc/pagetypeinfo'
> > 
> > * Before
> > Number of blocks type (movable)
> > DMA32: 207
> > 
> > Number of mixed blocks (movable)
> > DMA32: 111.2
> > 
> > Mixed blocks means that there is one or more allocated page for
> > unmovable/reclaimable allocation in movable pageblock. Results shows that
> > more than half of movable pageblock is tainted by other migratetype
> > allocation.
> > 
> > To mitigate this fragmentation, this patch implements active
> > anti-fragmentation algorithm. Idea is really simple. When some
> > unmovable/reclaimable steal happens from movable pageblock, we try to
> > migrate out other pages that can be migratable in this pageblock are and
> > use these generated freepage for further allocation request of
> > corresponding migratetype.
> > 
> > Once unmovable allocation taints movable pageblock, it cannot easily
> > recover. Instead of praying that it gets restored, making it unmovable
> > pageblock as much as possible and using it further unmovable request
> > would be more reasonable approach.
> > 
> > Below is result of this idea.
> > 
> > * After
> > Number of blocks type (movable)
> > DMA32: 208.2
> > 
> > Number of mixed blocks (movable)
> > DMA32: 55.8
> > 
> > Result shows that non-mixed block increase by 59% in this case.
> > 
> > Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@....com>
> 
> I haven't read the patch in detail but there were a few reasons why
> active avoidance was not implemented originally.

Thanks for really good comment. I can understand why it is in current
form from your comment and what I should consider.

> 
> 1. If pages in the target block were reclaimed then it potentially
>    increased stall latency in the future when they had to be refaulted
>    again. A prototype that used lumpy reclaim originally suffered extreme
>    stalls and was ultimately abandoned. The alternative at the time was
>    to increase min_free_kbytes by default as it had a similar effect with
>    much less disruption

Reclaim is not used by this patchset.

> 2. If the pages in the target block were migrated then there was
>    compaction overhead with no guarantee of success. Again, there were
>    concerns about stalls. This was not deferred to an external thread
>    because if the fragmenting process did not stall then it could simply
>    cause more fragmentation-related damage while the thread executes. It

Yes, this patch uses migration for active fragmentation avoidance.
But, I'm not sure why external thread approach could simply cause more
fragmentation-related damage. It cannot possibly follow-up allocation
speed of fragmenting process, but, even in this case, fragmentation
would be lower compared to do nothing approach.

>    becomes very unpredictable. While migration is in progress, processes
>    also potentially stall if they reference the targetted pages.

I should admit that if processes reference the targetted pages, they
would stall and there is compaction overhead with no guarantee of
success. But, this fragmentation avoidance is really needed for low
memory system, because, in that system, fragmentation could be really high
so even order 2 allocation suffers from fragmentation. File pages are
excessively reclaimed to make order 2 page. I think that this reclaim
overhead is worse than above overhead causing by this fragmentation
avoidance algorithm.

> 3. Further on 2, the migration itself potentially triggers more fallback
>    events while pages are isolated for the migration.
> 
> 4. Migrating pages to another node is a bad idea. It requires a NUMA
>    machine at the very least but more importantly it could violate memory
>    policies. If the page was mapped then the VMA could be checked but if the
>    pages were unmapped then the kernel potentially violates memory policies

I agree. Migrating pages to another node is not intended behaviour.
I will fix it.

> At the time it was implemented, fragmentation avoidance was primarily
> concerned about allocating hugetlbfs pages and later THP. Failing either
> was not a functional failure that users would care about but large stalls
> due to active fragmentation avoidance would disrupt workloads badly.

I see. But, my attempt of this patchset is kind of functional failure.
If unmovable pages are mixed to movable pages in movable pageblock,
even small order allocation isn't processed easily in highly
fragmented low memory system.

> Just be sure to take the stalling and memory policy problems into
> account.

Okay. This versioned patch doesn't consider stalling so it isolates whole
migratable pages in pageblock all at once. :)
I will fix it in next spin.

And, ditto for memory policy.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ