[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <70610ea1-5932-a19f-5eba-c4fba06335da@linux.alibaba.com>
Date: Thu, 20 Oct 2022 15:15:26 +0800
From: Baolin Wang <baolin.wang@...ux.alibaba.com>
To: David Hildenbrand <david@...hat.com>, akpm@...ux-foundation.org
Cc: arnd@...db.de, jingshan@...ux.alibaba.com, linux-mm@...ck.org,
linux-arch@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH] mm: Introduce new MADV_NOMOVABLE behavior
On 10/19/2022 11:17 PM, David Hildenbrand wrote:
>> I observed one migration failure case (which is not easy to reproduce)
>> is that, the 'thp_migration_fail' count is 1 and the
>> 'thp_split_page_failed' count is also 1.
>>
>> That means when migrating a THP which is in CMA area, but can not
>> allocate a new THP due to memory fragmentation, so it will split the
>> THP. However THP split is also failed, probably the reason is temporary
>> reference count of this THP. And the temporary reference count can be
>> caused by dropping page caches (I observed the drop caches operation in
>> the system), but we can not drop the shmem page caches due to they are
>> already dirty at that time.
>>
>> So we can try again in migrate_pages() if THP split is failed to
>> mitigate the failure of migration, especially for the failure reason is
>> temporary reference count? Does this sound reasonable for you?
>
> It sound reasonable, and I understand that debugging these issues is
> tricky. But we really have to figure out the root cause to make these
> pages that are indeed movable (but only temporarily not movable for
> reason XYZ) movable.
>
> We'd need some indication to retry migration longer / again.
OK. Let me try this and see if there are other possible failure cases in
the products.
>>
>> However I still worried there are other possible cases to cause
>> migration failure, so no CMA allocation for our case seems more stable
>> IMO.
>
> Yes, I can understand that. But as one example, you're approach doesn't
> handle the case that a page that was allocated on !CMA/!ZONE_MOVABLE
> would get migrated to CMA/ZONE_MOVABLE just before you would try pinning
> the page (to migrate it again off CMA/ZONE_MOVABLE).
Indeed, like you said before, just helpful to minimize page migration
now. Maybe I can take MADV_PINNABLE into considering when allocating new
pages, such as alloc_migration_target().
Anyway let me try to fix the root cause first to see if it can solve our
problem.
> We really have to fix the root cause.
OK. Thanks for your input.
Powered by blists - more mailing lists