linux-kernel - Re: [PATCH 1/2] mm/khugepaged: do synchronous writeback for MADV

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <77925a0b-fa06-4200-a967-a66bd93201db@amd.com>
Date: Tue, 11 Nov 2025 01:10:03 +0530
From: "Garg, Shivank" <shivankg@....com>
To: Zi Yan <ziy@...dia.com>, Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
 David Hildenbrand <david@...hat.com>,
 Baolin Wang <baolin.wang@...ux.alibaba.com>,
 "Liam R . Howlett" <Liam.Howlett@...cle.com>, Nico Pache
 <npache@...hat.com>, Ryan Roberts <ryan.roberts@....com>,
 Dev Jain <dev.jain@....com>, Barry Song <baohua@...nel.org>,
 Lance Yang <lance.yang@...ux.dev>, Steven Rostedt <rostedt@...dmis.org>,
 Masami Hiramatsu <mhiramat@...nel.org>,
 Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
 Zach O'Keefe <zokeefe@...gle.com>, linux-mm@...ck.org,
 linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org,
 Branden Moore <Branden.Moore@....com>
Subject: Re: [PATCH 1/2] mm/khugepaged: do synchronous writeback for
 MADV_COLLAPSE



On 11/10/2025 10:25 PM, Zi Yan wrote:
> On 10 Nov 2025, at 11:06, Lorenzo Stoakes wrote:
> 
>> On Mon, Nov 10, 2025 at 01:22:16PM +0000, Lorenzo Stoakes wrote:
>>> On Mon, Nov 10, 2025 at 06:37:58PM +0530, Garg, Shivank wrote:
>>>>
>>>>
>>>> On 11/10/2025 5:31 PM, Lorenzo Stoakes wrote:
>>>>> On Mon, Nov 10, 2025 at 11:32:53AM +0000, Shivank Garg wrote:
>>>>>> When MADV_COLLAPSE is called on file-backed mappings (e.g., executable
>>>>
>>>>>> ---
>>>>>> Applies cleanly on:
>>>>>> 6.18-rc5
>>>>>> mm-stable:e9a6fb0bc
>>>>>
>>>>> Please base on mm-unstable. mm-stable is usually out of date until very close to
>>>>> merge window.
>>>>
>>>> I'm observing issues when testing with kselftest on mm-unstable and mm-new branches that prevent
>>>> proper testing for my patches:
>>>>
>>>> On mm-unstable (without my patches):
>>>>
>>>> # # running ./transhuge-stress -d 20
>>>> # # --------------------------------
>>>> # # TAP version 13
>>>> # # 1..1
>>>> # # transhuge-stress: allocate 220271 transhuge pages, using 440543 MiB virtual memory and 1720 MiB of ram
>>>>
>>>>
>>>> [  367.225667] RIP: 0010:swap_cache_get_folio+0x2d/0xc0
>>>> [  367.230635] Code: 00 00 48 89 f9 49 89 f9 48 89 fe 48 c1 e1 06 49 c1 e9 3a 48 c1 e9 0f 48 c1 e1 05 4a 8b 04 cd c0 2e 5b 99 48 8b 78 60 48 01 cf <48> 8b 47 08 48 85 c0 74 20 48 89 f2 81 e2 ff 01 00 00 48 8d 04 d0
>>>> [  367.249378] RSP: 0000:ffffcde32943fba8 EFLAGS: 00010282
>>>> [  367.254605] RAX: ffff8bd1668fdc00 RBX: 00007ffc15df5000 RCX: 00003fffffffffe0
>>>> [  367.261736] RDX: ffffffff995cb530 RSI: 0003ffffffffffff RDI: ffffcbd1560dffe0
>>>> [  367.268862] RBP: 0003ffffffffffff R08: ffffcde32943fc47 R09: 0000000000000000
>>>> [  367.275994] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
>>>> [  367.283129] R13: 0000000000000000 R14: ffff8bd1668fdc00 R15: 0000000000100cca
>>>> [  367.290260] FS:  00007ff600af5b80(0000) GS:ffff8c4e9ec7e000(0000) knlGS:0000000000000000
>>>> [  367.298344] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [  367.304083] CR2: ffffcbd1560dffe8 CR3: 00000001280e9001 CR4: 0000000000770ef0
>>>> [  367.311216] PKRU: 55555554
>>>> [  367.313929] Call Trace:
>>>> [  367.316375]  <TASK>
>>>> [  367.318479]  __read_swap_cache_async+0x8e/0x1b0
>>>> [  367.323014]  swap_vma_readahead+0x23d/0x430
>>>> [  367.327198]  swapin_readahead+0xb0/0xc0
>>>> [  367.331039]  do_swap_page+0x5bc/0x1260
>>>> [  367.334789]  ? rseq_ip_fixup+0x6f/0x190
>>>> [  367.338631]  ? __pfx_default_wake_function+0x10/0x10
>>>> [  367.343596]  __handle_mm_fault+0x49a/0x760
>>>> [  367.347696]  handle_mm_fault+0x188/0x300
>>>> [  367.351620]  do_user_addr_fault+0x15b/0x6c0
>>>> [  367.355807]  exc_page_fault+0x60/0x100
>>>> [  367.359562]  asm_exc_page_fault+0x22/0x30
>>>> [  367.363574] RIP: 0033:0x7ff60091ba99
>>>> [  367.367153] Code: f7 d8 64 89 02 b8 ff ff ff ff eb bd e8 40 c4 01 00 f3 0f 1e fa 80 3d b5 f5 0e 00 00 74 13 31 c0 0f 05 48 3d 00 f0 ff ff 77 4f <c3> 66 0f 1f 44 00 00 55 48 89 e5 48 83 ec 20 48 89 55 e8 48 89 75
>>>> [  367.385897] RSP: 002b:00007ffc15df1118 EFLAGS: 00010203
>>>> [  367.391124] RAX: 0000000000000001 RBX: 000055941fb672a0 RCX: 00007ff60091ba91
>>>> [  367.398256] RDX: 0000000000000001 RSI: 000055941fb813e0 RDI: 0000000000000000
>>>> [  367.405387] RBP: 00007ffc15df21e0 R08: 0000000000000000 R09: 0000000000000007
>>>> [  367.412513] R10: 000055941fb97cb0 R11: 0000000000000246 R12: 000055941fb813e0
>>>> [  367.419646] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
>>>> [  367.426781]  </TASK>
>>>> [  367.428970] Modules linked in: xfrm_user xfrm_algo xt_addrtype xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables overlay bridge stp llc cfg80211 rfkill binfmt_misc ipmi_ssif amd_atl intel_rapl_msr intel_rapl_common wmi_bmof amd64_edac edac_mce_amd mgag200 rapl drm_client_lib i2c_algo_bit drm_shmem_helper drm_kms_helper acpi_cpufreq i2c_piix4 ptdma k10temp i2c_smbus wmi acpi_power_meter ipmi_si acpi_ipmi ipmi_devintf ipmi_msghandler sg dm_multipath drm fuse dm_mod nfnetlink ext4 crc16 mbcache jbd2 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq raid1 raid0 kvm_amd sd_mod ahci nvme libahci kvm libata nvme_core tg3 ccp megaraid_sas irqbypass
>>>> [  367.497528] CR2: ffffcbd1560dffe8
>>>> [  367.500846] ---[ end trace 0000000000000000 ]---
>>>
>>> Yikes, oopsies!
>>>
>>> I'll try running tests locally on threadripper, but ran tests against yours
>>> previously and seemed fine, strange. Maybe fixed since but let me try, maybe
>>> because swap is not enabled locally for me?
>>>
>>> Likely this actually...
>>
>> I have tried on swap-enabled setup and no issue with mm-unstable.
>>
>> So this is odd, I know you have limited time (_totally sympathise_) but is it at
>> all possible if you get a moment to bisect against tip mm-unstable/mm-new?
>>
>> Obviously we want to make sure buggy swap code doesn't get merged to mainline!
> 
> I could not reproduce locally either.
> 
> Shivank, can you share your config file and machine config?

I bisected the crash on mm-unstable

b14d61d8fe442b1cc2d7591cf040a6dcd7fe2dd8 is the first bad commit
commit b14d61d8fe442b1cc2d7591cf040a6dcd7fe2dd8
Author: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
Date:   Sat Nov 8 17:08:18 2025 +0000

    mm: eliminate is_swap_pte() when softleaf_from_pte() suffices

    In cases where we can simply utilise the fact that softleaf_from_pte()
    treats present entries as if they were none entries and thus eliminate
    spurious uses of is_swap_pte(), do so.

    No functional change intended.

Got delayed because I did a wrong step in git bisect and wanted to double confirm bisection.

No worries on time. Happy to test whenever you have a patch.

AMD Zen 3 EPYC server (7713) (2-sockets, 32 cores, SMT Enabled), 1 NUMA node per socket

               total        used        free      shared  buff/cache   available
Mem:           430Gi       183Gi       248Gi       5.5Mi       489Mi       246Gi
Swap:          8.0Gi        84Ki       8.0Gi

/proc/cmdline options:
init_on_alloc=0 console=ttyS0,115200n8 earlyprintk mitigations=off nohz_full= nohz=off amd_pstate=disable preempt=none

config attached.

Let me know if you need any other information:

Thanks,
Shivank
View attachment ".config" of type "text/plain" (177252 bytes)