lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7fd0262d-ff36-d621-191e-4f623a2038c0@google.com>
Date: Tue, 6 May 2025 14:44:17 -0700 (PDT)
From: Hugh Dickins <hughd@...gle.com>
To: Muchun Song <songmuchun@...edance.com>
cc: Johannes Weiner <hannes@...xchg.org>, mhocko@...nel.org, 
    roman.gushchin@...ux.dev, shakeel.butt@...ux.dev, muchun.song@...ux.dev, 
    akpm@...ux-foundation.org, david@...morbit.com, zhengqi.arch@...edance.com, 
    yosry.ahmed@...ux.dev, nphamcs@...il.com, chengming.zhou@...ux.dev, 
    linux-kernel@...r.kernel.org, cgroups@...r.kernel.org, linux-mm@...ck.org, 
    hamzamahfooz@...ux.microsoft.com, apais@...ux.microsoft.com, 
    Hugh Dickins <hughd@...gle.com>
Subject: Re: [PATCH RFC 07/28] mm: thp: use folio_batch to handle THP splitting
 in deferred_split_scan()

On Mon, 5 May 2025, Hugh Dickins wrote:
...
> 
> However... I was intending to run it for 12 hours on the workstation,
> but after 11 hours and 35 minutes, that crashed with list_del corruption,
> kernel BUG at lib/list_debug.c:65! from deferred_split_scan()'s
> list_del_init().
> 
> I've not yet put together the explanation: I am deeply suspicious of
> the change to when list_empty() becomes true (the block Hannes shows
> above is not the only such: (__)folio_unqueue_deferred_split() and
> migrate_pages_batch() consult it too), but each time I think I have
> the explanation, it's ruled out by folio_try_get()'s reference.
> 
> And aside from the crash (I don't suppose 6.15-rc5 is responsible,
> or that patches 08-28/28 would fix it), I'm not so sure that this
> patch is really an improvement (folio reference held for longer, and
> list lock taken more often when split fails: maybe not important, but
> I'm also not so keen on adding in fbatch myself).  I didn't spend very
> long looking through the patches, but maybe this 07/28 is not essential?

The BUG would be explained by deferred_split_folio(): that is still using
list_empty(&folio->_deferred_list) to decide whether the folio needs to be
added to the _deferred_list (else is already there).  With the 07/28 mods,
it's liable to add THP to the _deferred_list while deferred_split_scan()
holds that THP in its local fbatch.  I haven't tried to go through all the
ways in which that may go horribly wrong (or be harmless), but one of them
is deferred_split_scan() after failed split doing a second list_add_tail()
on that THP: no!  I won't think about fixes, I'll  move on to other tasks.

Or does that get changed in 08-28/28? I've not looked.

Hugh

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ