[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <db9b9752-a106-a3af-12f5-9894adee7ba7@linux.vnet.ibm.com>
Date: Fri, 5 Jan 2018 09:22:22 +0530
From: Anshuman Khandual <khandual@...ux.vnet.ibm.com>
To: Michal Hocko <mhocko@...nel.org>,
Andrew Morton <akpm@...ux-foundation.org>
Cc: Zi Yan <zi.yan@...rutgers.edu>,
Naoya Horiguchi <n-horiguchi@...jp.nec.com>,
"Kirill A. Shutemov" <kirill@...temov.name>,
Vlastimil Babka <vbabka@...e.cz>,
Andrea Reale <ar@...ux.vnet.ibm.com>,
Anshuman Khandual <khandual@...ux.vnet.ibm.com>,
linux-mm@...ck.org, LKML <linux-kernel@...r.kernel.org>,
Michal Hocko <mhocko@...e.com>
Subject: Re: [PATCH 1/3] mm, numa: rework do_pages_move
On 01/03/2018 01:55 PM, Michal Hocko wrote:
> From: Michal Hocko <mhocko@...e.com>
>
> do_pages_move is supposed to move user defined memory (an array of
> addresses) to the user defined numa nodes (an array of nodes one for
> each address). The user provided status array then contains resulting
> numa node for each address or an error. The semantic of this function is
> little bit confusing because only some errors are reported back. Notably
> migrate_pages error is only reported via the return value. This patch
> doesn't try to address these semantic nuances but rather change the
> underlying implementation.
>
> Currently we are processing user input (which can be really large)
> in batches which are stored to a temporarily allocated page. Each
> address is resolved to its struct page and stored to page_to_node
> structure along with the requested target numa node. The array of these
> structures is then conveyed down the page migration path via private
> argument. new_page_node then finds the corresponding structure and
> allocates the proper target page.
>
> What is the problem with the current implementation and why to change
> it? Apart from being quite ugly it also doesn't cope with unexpected
> pages showing up on the migration list inside migrate_pages path.
> That doesn't happen currently but the follow up patch would like to
> make the thp migration code more clear and that would need to split a
> THP into the list for some cases.
>
> How does the new implementation work? Well, instead of batching into a
> fixed size array we simply batch all pages that should be migrated to
> the same node and isolate all of them into a linked list which doesn't
> require any additional storage. This should work reasonably well because
> page migration usually migrates larger ranges of memory to a specific
> node. So the common case should work equally well as the current
> implementation. Even if somebody constructs an input where the target
> numa nodes would be interleaved we shouldn't see a large performance
> impact because page migration alone doesn't really benefit from
> batching. mmap_sem batching for the lookup is quite questionable and
> isolate_lru_page which would benefit from batching is not using it even
> in the current implementation.
Hi Michal,
After slightly modifying your test case (like fixing the page size for
powerpc and just doing simple migration from node 0 to 8 instead of the
interleaving), I tried to measure the migration speed with and without
the patches on mainline. Its interesting....
10000 pages | 100000 pages
--------------------------
Mainline 165 ms 1674 ms
Mainline + first patch (move_pages) 191 ms 1952 ms
Mainline + all three patches 146 ms 1469 ms
Though overall it gives performance improvement, some how it slows
down migration after the first patch. Will look into this further.
Powered by blists - more mailing lists