[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <87zggk5njh.fsf@nvdebian.thelocal>
Date: Thu, 04 Aug 2022 19:57:21 +1000
From: Alistair Popple <apopple@...dia.com>
To: David Hildenbrand <david@...hat.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
jgg@...dia.com, minchan@...nel.org, linux-kernel@...r.kernel.org,
jhubbard@...dia.com, pasha.tatashin@...een.com
Subject: Re: [PATCH v2] mm/gup.c: Simplify and fix
check_and_migrate_movable_pages() return codes
David Hildenbrand <david@...hat.com> writes:
> On 04.08.22 02:12, Alistair Popple wrote:
>>
>> Andrew Morton <akpm@...ux-foundation.org> writes:
>>
>>> On Tue, 2 Aug 2022 10:30:12 +1000 Alistair Popple <apopple@...dia.com> wrote:
>>>
>>>> When pinning pages with FOLL_LONGTERM check_and_migrate_movable_pages()
>>>> is called to migrate pages out of zones which should not contain any
>>>> longterm pinned pages.
>>>>
>>>> When migration succeeds all pages will have been unpinned so pinning
>>>> needs to be retried. This is indicated by returning zero. When all pages
>>>> are in the correct zone the number of pinned pages is returned.
>>>>
>>>> However migration can also fail, in which case pages are unpinned and
>>>> -ENOMEM is returned. However if the failure was due to not being unable
>>>> to isolate a page zero is returned. This leads to indefinite looping in
>>>> __gup_longterm_locked().
>>>>
>>>> Fix this by simplifying the return codes such that zero indicates all
>>>> pages were successfully pinned in the correct zone while errors indicate
>>>> either pages were migrated and pinning should be retried or that
>>>> migration has failed and therefore the pinning operation should fail.
>>>>
>>>> This fixes the indefinite looping on page isolation failure by failing
>>>> the pin operation instead of retrying indefinitely.
>>>>
>>>
>>> Are we able to identify a Fixes: for this? Presumably something in the
>>> series "Add MEMORY_DEVICE_COHERENT for coherent device memory mapping"?
>>
>> It seems the infinite loop was desired behaviour so I will re-spin this
>> as a pure clean-up.
>>
>
> How can the infinite loop trigger when we allow longterm-pinning the
> shared zeropage? (note: disallowing that for now was a bug)
Right, I don't know of any other triggers so based on the discussion
Pasha pointed me at I think the infinite loop is probably fine unless
there are other bugs.
Apologies I should have copied you on the new version which is just a
clean-up now -
https://lore.kernel.org/linux-mm/20220804032241.859891-1-apopple@nvidia.com/
Powered by blists - more mailing lists