[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9945a66f-4434-b2a6-63ac-3240ef1d52c9@suse.cz>
Date: Fri, 21 Jun 2019 13:33:47 +0200
From: Vlastimil Babka <vbabka@...e.cz>
To: Yang Shi <yang.shi@...ux.alibaba.com>,
Michal Hocko <mhocko@...nel.org>
Cc: akpm@...ux-foundation.org, mgorman@...hsingularity.net,
linux-mm@...ck.org, linux-kernel@...r.kernel.org,
Eric Dumazet <edumazet@...gle.com>,
"David S. Miller" <davem@...emloft.net>, netdev@...r.kernel.org
Subject: Re: [PATCH] mm: mempolicy: handle vma with unmovable pages mapped
correctly in mbind
On 6/20/19 6:08 PM, Yang Shi wrote:
>
>
> On 6/20/19 12:18 AM, Vlastimil Babka wrote:
>> On 6/19/19 8:19 PM, Yang Shi wrote:
>>>>>> This is getting even more muddy TBH. Is there any reason that we
>>>>>> have to
>>>>>> handle this problem during the isolation phase rather the migration?
>>>>> I think it was already said that if pages can't be isolated, then
>>>>> migration phase won't process them, so they're just ignored.
>>>> Yes,exactly.
>>>>
>>>>> However I think the patch is wrong to abort immediately when
>>>>> encountering such page that cannot be isolated (AFAICS). IMHO it should
>>>>> still try to migrate everything it can, and only then return -EIO.
>>>> It is fine too. I don't see mbind semantics define how to handle such
>>>> case other than returning -EIO.
>> I think it does. There's:
>> If MPOL_MF_MOVE is specified in flags, then the kernel *will attempt to
>> move all the existing pages* ... If MPOL_MF_STRICT is also specified,
>> then the call fails with the error *EIO if some pages could not be moved*
>>
>> Aborting immediately would be against the attempt to move all.
>>
>>> By looking into the code, it looks not that easy as what I thought.
>>> do_mbind() would check the return value of queue_pages_range(), it just
>>> applies the policy and manipulates vmas as long as the return value is 0
>>> (success), then migrate pages on the list. We could put the movable
>>> pages on the list by not breaking immediately, but they will be ignored.
>>> If we migrate the pages regardless of the return value, it may break the
>>> policy since the policy will *not* be applied at all.
>> I think we just need to remember if there was at least one page that
>> failed isolation or migration, but keep working, and in the end return
>> EIO if there was such page(s). I don't think it breaks the policy. Once
>> pages are allocated in a mapping, changing the policy is a best effort
>> thing anyway.
>
> The current behavior is:
> If queue_pages_range() return -EIO (vma is not migratable, ignore other
> conditions since we just focus on page migration), the policy won't be
> set and no page will be migrated.
Ah, I see. IIUC the current behavior is due to your recent commit
a7f40cfe3b7a ("mm: mempolicy: make mbind() return -EIO when
MPOL_MF_STRICT is specified") in order to fix commit 6f4576e3687b
("mempolicy: apply page table walker on queue_pages_range()"), which
caused -EIO to be not returned enough. But I think you went too far and
instead return -EIO too much. If I look at the code in parent commit of
6f4576e3687b, I can see in queue_pages_range():
if ((flags & MPOL_MF_STRICT) ||
((flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) &&
vma_migratable(vma))) {
err = queue_pages_pgd_range(vma, start, endvma, nodes,
flags, private);
if (err)
break;
}
and in queue_pages_pte_range():
if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL))
migrate_page_add(page, private, flags);
else
break;
So originally, there was no returning of -EIO due to !vma_migratable() -
as long as MPOL_MF_STRICT and MPOL_MF_MOVE* was specified, the code
tried to queue for migration everything it could and didn't ever abort,
AFAICS. And I still think that's the best possible behavior.
> However, the problem here is the vma might look migratable, but some or
> all the underlying pages are unmovable. So, my patch assumes the vma is
> *not* migratable if at least one page is unmovable. I'm not sure if it
> is possible to have both movable and unmovable pages for the same
> mapping or not, I'm supposed the vma would be split much earlier.
>
> If we don't abort immediately, then we record if there is unmovable
> page, then we could do:
> #1. Still follows the current behavior (then why not abort immediately?)
See above how the current behavior differs from the original one.
> #2. Set mempolicy then migrate all the migratable pages. But, we may end
> up with the pages on node A, but the policy says node B. Doesn't it
> break the policy?
The policy can already be "broken" (violated is probably better word) by
migrate_pages() failing. If that happens, we don't rollback the migrated
pages and reset the policy back, right? I think the manpage is clear
that MPOL_MF_MOVE is a best-effort. Userspace will know that not
everything was successfully migrated (via -EIO), and can take whatever
steps it deems necessary - attempt rollback, determine which exact
page(s) are violating the policy, etc.
>>
>>>>
>
Powered by blists - more mailing lists