linux-kernel - Re: [PATCH v5 14/19] mm, swap: cleanup swap entry management workflow

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMgjq7AQWswq1Hv_vbPnpNgVuzaRFyGttwm+euE3S=C6+iFHaA@mail.gmail.com>
Date: Thu, 15 Jan 2026 00:22:45 +0800
From: Kairui Song <ryncsn@...il.com>
To: "Lai, Yi" <yi1.lai@...ux.intel.com>
Cc: linux-mm@...ck.org, Andrew Morton <akpm@...ux-foundation.org>, 
	Baoquan He <bhe@...hat.com>, Barry Song <baohua@...nel.org>, Chris Li <chrisl@...nel.org>, 
	Nhat Pham <nphamcs@...il.com>, Yosry Ahmed <yosry.ahmed@...ux.dev>, 
	David Hildenbrand <david@...nel.org>, Johannes Weiner <hannes@...xchg.org>, 
	Youngjun Park <youngjun.park@....com>, Hugh Dickins <hughd@...gle.com>, 
	Baolin Wang <baolin.wang@...ux.alibaba.com>, Ying Huang <ying.huang@...ux.alibaba.com>, 
	Kemeng Shi <shikemeng@...weicloud.com>, Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, 
	"Matthew Wilcox (Oracle)" <willy@...radead.org>, linux-kernel@...r.kernel.org, 
	linux-pm@...r.kernel.org, "Rafael J. Wysocki (Intel)" <rafael@...nel.org>
Subject: Re: [PATCH v5 14/19] mm, swap: cleanup swap entry management workflow

On Wed, Jan 14, 2026 at 9:28 PM Lai, Yi <yi1.lai@...ux.intel.com> wrote:
>
> Hi Kairui Song,
>
> Greetings!
>
> I used Syzkaller and found that there is possible deadlock in swap_free_hibernation_slot in linux-next next-20260113.
>
> After bisection and the first bad commit is:
> "
> 33be6f68989d mm. swap: cleanup swap entry management workflow
> "
>
> All detailed into can be found at:
> https://github.com/laifryiee/syzkaller_logs/tree/main/260114_102849_swap_free_hibernation_slot
> Syzkaller repro code:
> https://github.com/laifryiee/syzkaller_logs/tree/main/260114_102849_swap_free_hibernation_slot/repro.c
> Syzkaller repro syscall steps:
> https://github.com/laifryiee/syzkaller_logs/tree/main/260114_102849_swap_free_hibernation_slot/repro.prog
> Syzkaller report:
> https://github.com/laifryiee/syzkaller_logs/tree/main/260114_102849_swap_free_hibernation_slot/repro.report
> Kconfig(make olddefconfig):
> https://github.com/laifryiee/syzkaller_logs/tree/main/260114_102849_swap_free_hibernation_slot/kconfig_origin
> Bisect info:
> https://github.com/laifryiee/syzkaller_logs/tree/main/260114_102849_swap_free_hibernation_slot/bisect_info.log
> bzImage:
> https://github.com/laifryiee/syzkaller_logs/raw/refs/heads/main/260114_102849_swap_free_hibernation_slot/bzImage_0f853ca2a798ead9d24d39cad99b0966815c582a
> Issue dmesg:
> https://github.com/laifryiee/syzkaller_logs/blob/main/260114_102849_swap_free_hibernation_slot/0f853ca2a798ead9d24d39cad99b0966815c582a_dmesg.log
>
> "
> [   62.477554] ============================================
> [   62.477802] WARNING: possible recursive locking detected
> [   62.478059] 6.19.0-rc5-next-20260113-0f853ca2a798 #1 Not tainted
> [   62.478324] --------------------------------------------
> [   62.478549] repro/668 is trying to acquire lock:
> [   62.478759] ffff888011664018 (&cluster_info[i].lock){+.+.}-{3:3}, at: swap_free_hibernation_slot+0x13e/0x2a0
> [   62.479271]
> [   62.479271] but task is already holding lock:
> [   62.479519] ffff888011664018 (&cluster_info[i].lock){+.+.}-{3:3}, at: swap_free_hibernation_slot+0xfa/0x2a0
> [   62.479984]
> [   62.479984] other info that might help us debug this:
> [   62.480293]  Possible unsafe locking scenario:
> [   62.480293]
> [   62.480565]        CPU0
> [   62.480686]        ----
> [   62.480809]   lock(&cluster_info[i].lock);
> [   62.481010]   lock(&cluster_info[i].lock);
> [   62.481205]
> [   62.481205]  *** DEADLOCK ***
> [   62.481205]
> [   62.481481]  May be due to missing lock nesting notation
> [   62.481481]
> [   62.481802] 2 locks held by repro/668:
> [   62.481981]  #0: ffffffff87542e28 (system_transition_mutex){+.+.}-{4:4}, at: lock_system_sleep+0x92/0xb0
> [   62.482439]  #1: ffff888011664018 (&cluster_info[i].lock){+.+.}-{3:3}, at: swap_free_hibernation_slot+0xfa/0x0
> [   62.482936]
> [   62.482936] stack backtrace:
> [   62.483131] CPU: 0 UID: 0 PID: 668 Comm: repro Not tainted 6.19.0-rc5-next-20260113-0f853ca2a798 #1 PREEMPT(l
> [   62.483143] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.q4
> [   62.483151] Call Trace:
> [   62.483156]  <TASK>
> [   62.483160]  dump_stack_lvl+0xea/0x150
> [   62.483195]  dump_stack+0x19/0x20
> [   62.483206]  print_deadlock_bug+0x22e/0x300
> [   62.483215]  __lock_acquire+0x1325/0x2210
> [   62.483226]  lock_acquire+0x170/0x2f0
> [   62.483234]  ? swap_free_hibernation_slot+0x13e/0x2a0
> [   62.483249]  _raw_spin_lock+0x38/0x50
> [   62.483267]  ? swap_free_hibernation_slot+0x13e/0x2a0
> [   62.483279]  swap_free_hibernation_slot+0x13e/0x2a0
> [   62.483291]  ? __pfx_swap_free_hibernation_slot+0x10/0x10
> [   62.483303]  ? locks_remove_file+0xe2/0x7f0
> [   62.483322]  ? __pfx_snapshot_release+0x10/0x10
> [   62.483331]  free_all_swap_pages+0xdd/0x160
> [   62.483339]  ? __pfx_snapshot_release+0x10/0x10
> [   62.483346]  snapshot_release+0xac/0x200
> [   62.483353]  __fput+0x41f/0xb70
> [   62.483369]  ____fput+0x22/0x30
> [   62.483376]  task_work_run+0x19e/0x2b0
> [   62.483391]  ? __pfx_task_work_run+0x10/0x10
> [   62.483398]  ? nsproxy_free+0x2da/0x5b0
> [   62.483410]  ? switch_task_namespaces+0x118/0x130
> [   62.483421]  do_exit+0x869/0x2810
> [   62.483435]  ? do_group_exit+0x1d8/0x2c0
> [   62.483445]  ? __pfx_do_exit+0x10/0x10
> [   62.483451]  ? __this_cpu_preempt_check+0x21/0x30
> [   62.483463]  ? _raw_spin_unlock_irq+0x2c/0x60
> [   62.483474]  ? lockdep_hardirqs_on+0x85/0x110
> [   62.483486]  ? _raw_spin_unlock_irq+0x2c/0x60
> [   62.483498]  ? trace_hardirqs_on+0x26/0x130
> [   62.483516]  do_group_exit+0xe4/0x2c0
> [   62.483524]  __x64_sys_exit_group+0x4d/0x60
> [   62.483531]  x64_sys_call+0x21a2/0x21b0
> [   62.483544]  do_syscall_64+0x6d/0x1180
> [   62.483560]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [   62.483584] RIP: 0033:0x7fe84fb18a4d
> [   62.483595] Code: Unable to access opcode bytes at 0x7fe84fb18a23.
> [   62.483602] RSP: 002b:00007fff3e35c928 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
> [   62.483610] RAX: ffffffffffffffda RBX: 00007fe84fbf69e0 RCX: 00007fe84fb18a4d
> [   62.483615] RDX: 00000000000000e7 RSI: ffffffffffffff80 RDI: 0000000000000001
> [   62.483620] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000020
> [   62.483624] R10: 00007fff3e35c7d0 R11: 0000000000000246 R12: 00007fe84fbf69e0
> [   62.483629] R13: 00007fe84fbfbf00 R14: 0000000000000001 R15: 00007fe84fbfbee8
> [   62.483640]  </TASK>
> "
>
> Hope this cound be insightful to you.
>
> Regards,
> Yi Lai
>
> ---
>
> If you don't need the following environment to reproduce the problem or if you
> already have one reproduced environment, please ignore the following information.
>
> How to reproduce:
> git clone https://gitlab.com/xupengfe/repro_vm_env.git
> cd repro_vm_env
> tar -xvf repro_vm_env.tar.gz
> cd repro_vm_env; ./start3.sh  // it needs qemu-system-x86_64 and I used v7.1.0
>   // start3.sh will load bzImage_2241ab53cbb5cdb08a6b2d4688feb13971058f65 v6.2-rc5 kernel
>   // You could change the bzImage_xxx as you want
>   // Maybe you need to remove line "-drive if=pflash,format=raw,readonly=on,file=./OVMF_CODE.fd \" for different qemu version
> You could use below command to log in, there is no password for root.
> ssh -p 10023 root@...alhost
>
> After login vm(virtual machine) successfully, you could transfer reproduced
> binary to the vm by below way, and reproduce the problem in vm:
> gcc -pthread -o repro repro.c
> scp -P 10023 repro root@...alhost:/root/
>
> Get the bzImage for target kernel:
> Please use target kconfig and copy it to kernel_src/.config
> make olddefconfig
> make -jx bzImage           //x should equal or less than cpu num your pc has
>
> Fill the bzImage file into above start3.sh to load the target kernel in vm.
>
>
> Tips:
> If you already have qemu-system-x86_64, please ignore below info.
> If you want to install qemu v7.1.0 version:
> git clone https://github.com/qemu/qemu.git
> cd qemu
> git checkout -f v7.1.0
> mkdir build
> cd build
> yum install -y ninja-build.x86_64
> yum -y install libslirp-devel.x86_64
> ../configure --target-list=x86_64-softmmu --enable-kvm --enable-vnc --enable-gtk --enable-sdl --enable-usb-redir --enable-slirp
> make
> make install
>

Thanks Lai!

The issue is with the WARN_ON I added... Didn't notice that the
WARN_ON requires ci lock, so we better just remove that.

Following change should fix the issue you reported:
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 85bf4f7d9ae7..8c0f31363c1f 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2096,7 +2096,6 @@ void swap_free_hibernation_slot(swp_entry_t entry)

        ci = swap_cluster_lock(si, offset);
        swap_entry_put_locked(si, ci, entry, 1);
-       WARN_ON(swap_entry_swapped(si, entry));
        swap_cluster_unlock(ci);

        /* In theory readahead might add it to the swap cache by accident */

---

swap_entry_swapped requires CI lock. There wasn't any WARN_ON before,
it was added to just ensure things worked as expected, really not
needed.