lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e21ea030-b05f-42e6-b479-b3e0789b9d97@linux.dev>
Date: Fri, 7 Nov 2025 18:09:29 +0800
From: Lance Yang <lance.yang@...ux.dev>
To: "David Hildenbrand (Red Hat)" <davidhildenbrandkernel@...il.com>,
 "Garg, Shivank" <shivankg@....com>,
 Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
 "Liam R. Howlett" <Liam.Howlett@...cle.com>,
 Ryan Roberts <ryan.roberts@....com>,
 Andrew Morton <akpm@...ux-foundation.org>, Zi Yan <ziy@...dia.com>,
 Baolin Wang <baolin.wang@...ux.alibaba.com>, Nico Pache <npache@...hat.com>,
 Dev Jain <dev.jain@....com>, Barry Song <baohua@...nel.org>,
 Vlastimil Babka <vbabka@...e.cz>, Jann Horn <jannh@...gle.com>,
 zokeefe@...gle.com, linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: madvise(MADV_COLLAPSE) fails with EINVAL on dirty file-backed
 text pages



On 2025/11/7 17:12, David Hildenbrand (Red Hat) wrote:
> 
>>
>> 5. Yes, I'm calling madvise(MADV_COLLAPSE) on the text portion of the 
>> executable, using the address
>>     range obtained from /proc/self/maps. IIUC, this should benefit 
>> applications by reducing ITLB pressure.
>>
>> I agree with the suggestions to either Return EAGAIN instead of EINVAL 
>> or At minimum, document the
>> EINVAL return for dirty pages. I'm happy to work on a patch.
> 
> Of course, we could detect that we are in MADV_COLLAPSE and simply 
> writeback ourselves. After all,
> user space asked for a collapse, and it's not khugepaged that will 
> simple revisit it later.
> 
> I did something similar in
> 
> commit ab73b29efd36f8916c6cc9954e912c4723c9a1b0
> Author: David Hildenbrand <david@...hat.com>
> Date:   Fri May 16 14:39:46 2025 +0200
> 
>      s390/uv: Improve splitting of large folios that cannot be split 
> while dirty
>      Currently, starting a PV VM on an iomap-based filesystem with large
>      folio support, such as XFS, will not work. We'll be stuck in
>      unpack_one()->gmap_make_secure(), because we can't seem to make 
> progress
>      splitting the large folio.
> 
> Where I effectively use filemap_write_and_wait_range().
> 
> It could be used early to writeback the whole range to collapse once, 
> possibly.

Exactly!

Since MADV_COLLAPSE is a best-effort thing, having the kernel use
something like filemap_write_and_wait_range() to writeback the pages
before collapsing is likely what users would expect.

Anyway, they just want to get a THP, whether the pages are dirty or
clean :)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ