lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <ED148952-0144-48CB-ADAD-012D58025035@linux.dev>
Date: Wed, 7 May 2025 11:30:01 +0800
From: Muchun Song <muchun.song@...ux.dev>
To: Hugh Dickins <hughd@...gle.com>
Cc: Muchun Song <songmuchun@...edance.com>,
 Johannes Weiner <hannes@...xchg.org>,
 mhocko@...nel.org,
 roman.gushchin@...ux.dev,
 shakeel.butt@...ux.dev,
 akpm@...ux-foundation.org,
 david@...morbit.com,
 zhengqi.arch@...edance.com,
 yosry.ahmed@...ux.dev,
 nphamcs@...il.com,
 chengming.zhou@...ux.dev,
 linux-kernel@...r.kernel.org,
 cgroups@...r.kernel.org,
 linux-mm@...ck.org,
 hamzamahfooz@...ux.microsoft.com,
 apais@...ux.microsoft.com
Subject: Re: [PATCH RFC 07/28] mm: thp: use folio_batch to handle THP
 splitting in deferred_split_scan()



> On May 7, 2025, at 05:44, Hugh Dickins <hughd@...gle.com> wrote:
> 
> On Mon, 5 May 2025, Hugh Dickins wrote:
> ...
>> 
>> However... I was intending to run it for 12 hours on the workstation,
>> but after 11 hours and 35 minutes, that crashed with list_del corruption,
>> kernel BUG at lib/list_debug.c:65! from deferred_split_scan()'s
>> list_del_init().
>> 
>> I've not yet put together the explanation: I am deeply suspicious of
>> the change to when list_empty() becomes true (the block Hannes shows
>> above is not the only such: (__)folio_unqueue_deferred_split() and
>> migrate_pages_batch() consult it too), but each time I think I have
>> the explanation, it's ruled out by folio_try_get()'s reference.
>> 
>> And aside from the crash (I don't suppose 6.15-rc5 is responsible,
>> or that patches 08-28/28 would fix it), I'm not so sure that this
>> patch is really an improvement (folio reference held for longer, and
>> list lock taken more often when split fails: maybe not important, but
>> I'm also not so keen on adding in fbatch myself).  I didn't spend very
>> long looking through the patches, but maybe this 07/28 is not essential?

Hi Hugh,

Really thanks for your time to look at this patch. 07/28 is actually a
necessary change in this series.

> 
> The BUG would be explained by deferred_split_folio(): that is still using
> list_empty(&folio->_deferred_list) to decide whether the folio needs to be
> added to the _deferred_list (else is already there).  With the 07/28 mods,
> it's liable to add THP to the _deferred_list while deferred_split_scan()
> holds that THP in its local fbatch.  I haven't tried to go through all the
> ways in which that may go horribly wrong (or be harmless), but one of them
> is deferred_split_scan() after failed split doing a second list_add_tail()
> on that THP: no!  I won't think about fixes, I'll  move on to other tasks.

Thanks for your analysis. I'll look at it deeply.

> 
> Or does that get changed in 08-28/28? I've not looked.

No. 08-28/28 did not change anything related to THP _deferred_list.

Muchun,
Thanks.

> 
> Hugh


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ