[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7fd0262d-ff36-d621-191e-4f623a2038c0@google.com>
Date: Tue, 6 May 2025 14:44:17 -0700 (PDT)
From: Hugh Dickins <hughd@...gle.com>
To: Muchun Song <songmuchun@...edance.com>
cc: Johannes Weiner <hannes@...xchg.org>, mhocko@...nel.org,
roman.gushchin@...ux.dev, shakeel.butt@...ux.dev, muchun.song@...ux.dev,
akpm@...ux-foundation.org, david@...morbit.com, zhengqi.arch@...edance.com,
yosry.ahmed@...ux.dev, nphamcs@...il.com, chengming.zhou@...ux.dev,
linux-kernel@...r.kernel.org, cgroups@...r.kernel.org, linux-mm@...ck.org,
hamzamahfooz@...ux.microsoft.com, apais@...ux.microsoft.com,
Hugh Dickins <hughd@...gle.com>
Subject: Re: [PATCH RFC 07/28] mm: thp: use folio_batch to handle THP splitting
in deferred_split_scan()
On Mon, 5 May 2025, Hugh Dickins wrote:
...
>
> However... I was intending to run it for 12 hours on the workstation,
> but after 11 hours and 35 minutes, that crashed with list_del corruption,
> kernel BUG at lib/list_debug.c:65! from deferred_split_scan()'s
> list_del_init().
>
> I've not yet put together the explanation: I am deeply suspicious of
> the change to when list_empty() becomes true (the block Hannes shows
> above is not the only such: (__)folio_unqueue_deferred_split() and
> migrate_pages_batch() consult it too), but each time I think I have
> the explanation, it's ruled out by folio_try_get()'s reference.
>
> And aside from the crash (I don't suppose 6.15-rc5 is responsible,
> or that patches 08-28/28 would fix it), I'm not so sure that this
> patch is really an improvement (folio reference held for longer, and
> list lock taken more often when split fails: maybe not important, but
> I'm also not so keen on adding in fbatch myself). I didn't spend very
> long looking through the patches, but maybe this 07/28 is not essential?
The BUG would be explained by deferred_split_folio(): that is still using
list_empty(&folio->_deferred_list) to decide whether the folio needs to be
added to the _deferred_list (else is already there). With the 07/28 mods,
it's liable to add THP to the _deferred_list while deferred_split_scan()
holds that THP in its local fbatch. I haven't tried to go through all the
ways in which that may go horribly wrong (or be harmless), but one of them
is deferred_split_scan() after failed split doing a second list_add_tail()
on that THP: no! I won't think about fixes, I'll move on to other tasks.
Or does that get changed in 08-28/28? I've not looked.
Hugh
Powered by blists - more mailing lists