linux-kernel - Re: Hang when swapping huge=within

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <e7b276eb-960a-4e05-9f84-6152de9ac2ea@linux.alibaba.com>
Date: Fri, 7 Feb 2025 15:23:54 +0800
From: Baolin Wang <baolin.wang@...ux.alibaba.com>
To: Lance Yang <ioworker0@...il.com>
Cc: "Alex Xu (Hello71)" <alex_y_xu@...oo.ca>, linux-mm@...ck.org,
 Daniel Gomez <da.gomez@...sung.com>, Barry Song <baohua@...nel.org>,
 David Hildenbrand <david@...hat.com>, Hugh Dickins <hughd@...gle.com>,
 Kefeng Wang <wangkefeng.wang@...wei.com>,
 Matthew Wilcox <willy@...radead.org>, Ryan Roberts <ryan.roberts@....com>,
 linux-kernel@...r.kernel.org, Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: Hang when swapping huge=within_size tmpfs from zram



On 2025/2/5 22:39, Lance Yang wrote:
> On Wed, Feb 5, 2025 at 2:38 PM Baolin Wang
> <baolin.wang@...ux.alibaba.com> wrote:
>>
>>
>>
>> On 2025/2/5 09:55, Baolin Wang wrote:
>>> Hi Alex,
>>>
>>> On 2025/2/5 09:23, Alex Xu (Hello71) wrote:
>>>> Hi all,
>>>>
>>>> On 6.14-rc1, I found that creating a lot of files in tmpfs then deleting
>>>> them reliably hangs when tmpfs is mounted with huge=within_size, and it
>>>> is swapped out to zram (zstd/zsmalloc/no backing dev). I bisected this
>>>> to acd7ccb284b "mm: shmem: add large folio support for tmpfs".
>>>>
>>>> When the issue occurs, rm uses 100% CPU, cannot be killed, and has no
>>>> output in /proc/pid/stack or wchan. Eventually, an RCU stall is
>>>> detected:
>>>
>>> Thanks for your report. Let me try to reproduce the issue locally and
>>> see what happens.
>>>
>>>> rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
>>>> rcu:     Tasks blocked on level-0 rcu_node (CPUs 0-11): P25160
>>>> rcu:     (detected by 10, t=2102 jiffies, g=532677, q=4997 ncpus=12)
>>>> task:rm              state:R  running task     stack:0     pid:25160
>>>> tgid:25160 ppid:24309  task_flags:0x400000 flags:0x00004004
>>>> Call Trace:
>>>>    <TASK>
>>>>    ? __schedule+0x388/0x1000
>>>>    ? kmem_cache_free.part.0+0x23d/0x280
>>>>    ? sysvec_apic_timer_interrupt+0xa/0x80
>>>>    ? asm_sysvec_apic_timer_interrupt+0x16/0x20
>>>>    ? xas_load+0x12/0xc0
>>>>    ? xas_load+0x8/0xc0
>>>>    ? xas_find+0x144/0x190
>>>>    ? find_lock_entries+0x75/0x260
>>>>    ? shmem_undo_range+0xe6/0x5f0
>>>>    ? shmem_evict_inode+0xe4/0x230
>>>>    ? mtree_erase+0x7e/0xe0
>>>>    ? inode_set_ctime_current+0x2e/0x1f0
>>>>    ? evict+0xe9/0x260
>>>>    ? _atomic_dec_and_lock+0x31/0x50
>>>>    ? do_unlinkat+0x270/0x2b0
>>>>    ? __x64_sys_unlinkat+0x30/0x50
>>>>    ? do_syscall_64+0x37/0xe0
>>>>    ? entry_SYSCALL_64_after_hwframe+0x50/0x58
>>>>    </TASK>
>>>>
>>>> Let me know what information is needed to further troubleshoot this
>>>> issue.
>>
>> Sorry, I can't reproduce this issue, and my testing process is as follows:
>> 1. Mount tmpfs with huge=within_size
>> 2. Create and write a tmpfs file
>> 3. Swap out the large folios of the tmpfs file to zram
>> 4. Execute 'rm' command to remove the tmpfs file
> 
> I’m unable to reproduce the issue as well, and am following steps similar
> to Baolin's process:
> 
> 1) Mount tmpfs with the huge=within_size option and enable swap (using
> zstd/zsmalloc without a backing device).
> 2) Create and write over 10,000 files in the tmpfs.
> 3) Swap out the large folios of these tmpfs files to zram.
> 4) Use the rm command to delete all the files from the tmpfs.
> 
> Testing with both 2MiB and 64KiB large folio sizes, and with
> shmem_enabled=within_size, but everything works as expected.

Thanks Lance for confirming again.

Alex, could you give more hints on how to reproduce this issue?