[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <ED148952-0144-48CB-ADAD-012D58025035@linux.dev>
Date: Wed, 7 May 2025 11:30:01 +0800
From: Muchun Song <muchun.song@...ux.dev>
To: Hugh Dickins <hughd@...gle.com>
Cc: Muchun Song <songmuchun@...edance.com>,
Johannes Weiner <hannes@...xchg.org>,
mhocko@...nel.org,
roman.gushchin@...ux.dev,
shakeel.butt@...ux.dev,
akpm@...ux-foundation.org,
david@...morbit.com,
zhengqi.arch@...edance.com,
yosry.ahmed@...ux.dev,
nphamcs@...il.com,
chengming.zhou@...ux.dev,
linux-kernel@...r.kernel.org,
cgroups@...r.kernel.org,
linux-mm@...ck.org,
hamzamahfooz@...ux.microsoft.com,
apais@...ux.microsoft.com
Subject: Re: [PATCH RFC 07/28] mm: thp: use folio_batch to handle THP
splitting in deferred_split_scan()
> On May 7, 2025, at 05:44, Hugh Dickins <hughd@...gle.com> wrote:
>
> On Mon, 5 May 2025, Hugh Dickins wrote:
> ...
>>
>> However... I was intending to run it for 12 hours on the workstation,
>> but after 11 hours and 35 minutes, that crashed with list_del corruption,
>> kernel BUG at lib/list_debug.c:65! from deferred_split_scan()'s
>> list_del_init().
>>
>> I've not yet put together the explanation: I am deeply suspicious of
>> the change to when list_empty() becomes true (the block Hannes shows
>> above is not the only such: (__)folio_unqueue_deferred_split() and
>> migrate_pages_batch() consult it too), but each time I think I have
>> the explanation, it's ruled out by folio_try_get()'s reference.
>>
>> And aside from the crash (I don't suppose 6.15-rc5 is responsible,
>> or that patches 08-28/28 would fix it), I'm not so sure that this
>> patch is really an improvement (folio reference held for longer, and
>> list lock taken more often when split fails: maybe not important, but
>> I'm also not so keen on adding in fbatch myself). I didn't spend very
>> long looking through the patches, but maybe this 07/28 is not essential?
Hi Hugh,
Really thanks for your time to look at this patch. 07/28 is actually a
necessary change in this series.
>
> The BUG would be explained by deferred_split_folio(): that is still using
> list_empty(&folio->_deferred_list) to decide whether the folio needs to be
> added to the _deferred_list (else is already there). With the 07/28 mods,
> it's liable to add THP to the _deferred_list while deferred_split_scan()
> holds that THP in its local fbatch. I haven't tried to go through all the
> ways in which that may go horribly wrong (or be harmless), but one of them
> is deferred_split_scan() after failed split doing a second list_add_tail()
> on that THP: no! I won't think about fixes, I'll move on to other tasks.
Thanks for your analysis. I'll look at it deeply.
>
> Or does that get changed in 08-28/28? I've not looked.
No. 08-28/28 did not change anything related to THP _deferred_list.
Muchun,
Thanks.
>
> Hugh
Powered by blists - more mailing lists