lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200123085526.GH29276@dhcp22.suse.cz>
Date:   Thu, 23 Jan 2020 09:55:26 +0100
From:   Michal Hocko <mhocko@...nel.org>
To:     Wei Yang <richardw.yang@...ux.intel.com>
Cc:     Yang Shi <yang.shi@...ux.alibaba.com>, akpm@...ux-foundation.org,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        stable@...r.kernel.org
Subject: Re: [v2 PATCH] mm: move_pages: report the number of non-attempted
 pages

On Thu 23-01-20 11:27:36, Wei Yang wrote:
> On Thu, Jan 23, 2020 at 07:38:51AM +0800, Yang Shi wrote:
> >Since commit a49bd4d71637 ("mm, numa: rework do_pages_move"),
> >the semantic of move_pages() was changed to return the number of
> >non-migrated pages (failed to migration) and the call would be aborted
> >immediately if migrate_pages() returns positive value.  But it didn't
> >report the number of pages that we even haven't attempted to migrate.
> >So, fix it by including non-attempted pages in the return value.
> >
> 
> First, we want to change the semantic of move_pages(2). The return value
> indicates the number of pages we didn't managed to migrate?
> 
> Second, the return value from migrate_pages() doesn't mean the number of pages
> we failed to migrate. For example, one -ENOMEM is returned on the first page,
> migrate_pages() would return 1. But actually, no page successfully migrated.

ENOMEM is considered a permanent failure and as such it is returned by
migrate pages (see goto out).

> Third, even the migrate_pages() return the exact non-migrate page, we are not
> sure those non-migrated pages are at the tail of the list. Because in the last
> case in migrate_pages(), it just remove the page from list. It could be a page
> in the middle of the list. Then, in userspace, how the return value be
> leveraged to determine the valid status? Any page in the list could be the
> victim.

Yes, I was wrong when stating that the caller would know better which
status to check. I misremembered the original patch as it was quite some
time ago. While storing the error code would be possible after some
massaging of migrate_pages is this really something we deeply care
about. The caller can achieve the same by initializing the status array
to a non-node number - e.g. -1 - and check based on that.

This system call has quite a complex semantic and I am not 100% sure
what is the right thing to do here. Maybe we do want to continue and try
to migrate as much as possible on non-fatal migration failures and
accumulate the number of failed pages while doing so.

The main problem is that we can have an academic discussion but
the primary question is what do actual users want. A lack of real
bug reports suggests that nobody has actually noticed this. So I
would rather keep returning the correct number of non-migrated
pages. Why? Because new users could have started depending on it. It
is not all that unlikely that the current implementation would just
work for them because they are migrating a set of pages on to the same
node so the batch would be a single list throughout the whole given
page set.
-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ