linux-kernel - Re: [RFC PATCH 1/3] mm, numa: rework do_pages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20171213120733.umeb7rylswl7chi5@node.shutemov.name>
Date:   Wed, 13 Dec 2017 15:07:33 +0300
From:   "Kirill A. Shutemov" <kirill@...temov.name>
To:     Michal Hocko <mhocko@...nel.org>
Cc:     linux-mm@...ck.org, Zi Yan <zi.yan@...rutgers.edu>,
        Naoya Horiguchi <n-horiguchi@...jp.nec.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Andrea Reale <ar@...ux.vnet.ibm.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Michal Hocko <mhocko@...e.com>
Subject: Re: [RFC PATCH 1/3] mm, numa: rework do_pages_move

On Fri, Dec 08, 2017 at 05:15:57PM +0100, Michal Hocko wrote:
> From: Michal Hocko <mhocko@...e.com>
> 
> do_pages_move is supposed to move user defined memory (an array of
> addresses) to the user defined numa nodes (an array of nodes one for
> each address). The user provided status array then contains resulting
> numa node for each address or an error. The semantic of this function is
> little bit confusing because only some errors are reported back. Notably
> migrate_pages error is only reported via the return value. This patch
> doesn't try to address these semantic nuances but rather change the
> underlying implementation.
> 
> Currently we are processing user input (which can be really large)
> in batches which are stored to a temporarily allocated page. Each
> address is resolved to its struct page and stored to page_to_node
> structure along with the requested target numa node. The array of these
> structures is then conveyed down the page migration path via private
> argument. new_page_node then finds the corresponding structure and
> allocates the proper target page.
> 
> What is the problem with the current implementation and why to change
> it? Apart from being quite ugly it also doesn't cope with unexpected
> pages showing up on the migration list inside migrate_pages path.
> That doesn't happen currently but the follow up patch would like to
> make the thp migration code more clear and that would need to split a
> THP into the list for some cases.
> 
> How does the new implementation work? Well, instead of batching into a
> fixed size array we simply batch all pages that should be migrated to
> the same node and isolate all of them into a linked list which doesn't
> require any additional storage. This should work reasonably well because
> page migration usually migrates larger ranges of memory to a specific
> node. So the common case should work equally well as the current
> implementation. Even if somebody constructs an input where the target
> numa nodes would be interleaved we shouldn't see a large performance
> impact because page migration alone doesn't really benefit from
> batching. mmap_sem batching for the lookup is quite questionable and
> isolate_lru_page which would benefit from batching is not using it even
> in the current implementation.
> 
> Signed-off-by: Michal Hocko <mhocko@...e.com>
> ---
>  mm/internal.h  |   1 +
>  mm/mempolicy.c |   5 +-
>  mm/migrate.c   | 306 +++++++++++++++++++++++++--------------------------------
>  3 files changed, 139 insertions(+), 173 deletions(-)

The approach looks fine to me.

But patch is rather large and hard to review. And how git mixed add/remove
lines doesn't help too. Any chance to split it up further?

One nitpick: I don't think 'chunk' terminology should go away with the
patch.

-- 
 Kirill A. Shutemov