[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <036614fd-e588-402c-8eb3-770ee9187bbd@kernel.org>
Date: Wed, 3 Dec 2025 12:28:51 +0100
From: "David Hildenbrand (Red Hat)" <david@...nel.org>
To: Gregory Price <gourry@...rry.net>
Cc: Michal Hocko <mhocko@...e.com>, Andrew Morton
<akpm@...ux-foundation.org>, Aboorva Devarajan <aboorvad@...ux.ibm.com>,
vbabka@...e.cz, surenb@...gle.com, jackmanb@...gle.com, hannes@...xchg.org,
ziy@...dia.com, linux-mm@...ck.org, linux-kernel@...r.kernel.org,
Oscar Salvador <OSalvador@...e.com>, Juan Yescas <jyescas@...gle.com>
Subject: Re: [PATCH] mm/page_alloc: make percpu_pagelist_high_fraction reads
lock-free
On 12/3/25 10:23, Gregory Price wrote:
> On Wed, Dec 03, 2025 at 10:08:55AM +0100, David Hildenbrand (Red Hat) wrote:
>> On 12/3/25 10:02, Gregory Price wrote:
>>>
>>> My transient failure (although i'm not sure it was actually transient, i
>>> killed it and retried after a few minutes and it succeeded immediately)
>>> was on a ZONE_MOVABLE block.
>>
>> Okay, so that one should not bail out. Longterm pinnins must never end up on
>> such memory, and if it happens, we have to identify why and fix it.
>>
>> We have this known problem of "stream of short-term pinnings" that can
>> temporarily turn memory effectively unmovable. Juan will talk about that at
>> LPC [1].
>
> Nice, fun, good topic. Looking forward to Japan n_n
>
>>
>> We have another set of problematic cases (vmsplice(), fuse) but I would
>> assume that these are not the cases you are hitting.
>>
>
> We do use fuse, but this system was relatively quiet when i tried this.
>
> We do have some proactive reclaim / demotion going on, but i don't think
> it was that (see below).
>
>>>
>>> Kind of suggested to me there was some bad condition the resolved once I
>>> took a second to release the lock and try again.
>>
>> Hard to tell I'm afraid. Do you still have the dump_folio() calls we print
>> when migration fails?
>>
>
> What luck, I do! :D
:)
> And i just noticed it's the same page over and over
>
> [ 3404.119270] migrating pfn c06f176 failed ret:1
> [ 3404.129152] page: refcount:4 mapcount:0 mapping:0000000061ca20ba index:0xad28e5b pfn:0xc06f176
> [ 3404.148284] memcg:ffff88842e855000
> [ 3404.155834] aops:btree_aops ino:1
Small folio. Not GUP-pinned (FOLL_PIN, otherwise our refcount would be
>= 1024.
It could be ordinary GUP (FOLL_GET) e.g., from vmsplice or some older
O_DIRECT user that was not converted to FOLL_PIN yet. But maybe it's
just btrfs / something else that temporarily holds a folio reference.
Given that this is from 6.13 ... hard to tell :)
> [ 3404.163193] flags: 0x17ffff066c00420c(referenced|uptodate|workingset|private|node=1|zone=3|lastcpupid=0x1ffff)
Neither dirty nor under writeback.
--
Cheers
David
Powered by blists - more mailing lists