[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <20260118122229.dcdda884bbb19a9c30ec6f1e@linux-foundation.org>
Date: Sun, 18 Jan 2026 12:22:29 -0800
From: Andrew Morton <akpm@...ux-foundation.org>
To: Shivank Garg <shivankg@....com>
Cc: David Hildenbrand <david@...nel.org>, Lorenzo Stoakes
<lorenzo.stoakes@...cle.com>, Zi Yan <ziy@...dia.com>, Baolin Wang
<baolin.wang@...ux.alibaba.com>, "Liam R . Howlett"
<Liam.Howlett@...cle.com>, Nico Pache <npache@...hat.com>, Ryan Roberts
<ryan.roberts@....com>, Dev Jain <dev.jain@....com>, Barry Song
<baohua@...nel.org>, Lance Yang <lance.yang@...ux.dev>, Masami Hiramatsu
<mhiramat@...nel.org>, Steven Rostedt <rostedt@...dmis.org>,
<linux-trace-kernel@...r.kernel.org>, Mathieu Desnoyers
<mathieu.desnoyers@...icios.com>, Zach O'Keefe <zokeefe@...gle.com>,
<linux-mm@...ck.org>, <linux-kernel@...r.kernel.org>, Stephen Rothwell
<sfr@...b.auug.org.au>
Subject: Re: [PATCH V5 0/2] mm/khugepaged: fix dirty page handling for
MADV_COLLAPSE
On Sun, 18 Jan 2026 19:09:38 +0000 Shivank Garg <shivankg@....com> wrote:
> MADV_COLLAPSE on file-backed mappings fails with -EINVAL when TEXT pages
> are dirty. This affects scenarios like package/container updates or
> executing binaries immediately after writing them, etc.
>
> The issue is that collapse_file() triggers async writeback and returns
> SCAN_FAIL (maps to -EINVAL), expecting khugepaged to revisit later. But
> MADV_COLLAPSE is synchronous and userspace expects immediate success or
> a clear retry signal.
>
> Reproduction:
> - Compile or copy 2MB-aligned executable to XFS/ext4 FS
> - Call MADV_COLLAPSE on .text section
> - First call fails with -EINVAL (text pages dirty from copy)
> - Second call succeeds (async writeback completed)
>
> Issue Report:
> https://lore.kernel.org/all/4e26fe5e-7374-467c-a333-9dd48f85d7cc@amd.com
Updated, thanks.
Please tolerate a little whining about the timeliess here. We're at
-rc6, v4 was added to mm.git over a month ago, had quite a lot of
review, this is very close to being moved into the mm-stable branch and now
we get v5. Argh.
> V5:
> - In patch 2/2, Simplify dirty writeback retry logic (David)
Are you sure this is the only change? It looks like a lot for a
simplification and I'm wondering if we should retain the v4 series and
defer a simplification for separate consideration during the next
cycle.
Below is how this updated altered mm.git. Could reviewers please check
this fairly soon?
--- a/mm/khugepaged.c~b
+++ a/mm/khugepaged.c
@@ -2788,11 +2788,11 @@ int madvise_collapse(struct vm_area_stru
hend = end & HPAGE_PMD_MASK;
for (addr = hstart; addr < hend; addr += HPAGE_PMD_SIZE) {
- bool retried = false;
int result = SCAN_FAIL;
+ bool triggered_wb = false;
- if (!mmap_locked) {
retry:
+ if (!mmap_locked) {
cond_resched();
mmap_read_lock(mm);
mmap_locked = true;
@@ -2812,52 +2812,27 @@ retry:
mmap_read_unlock(mm);
mmap_locked = false;
+ *lock_dropped = true;
result = hpage_collapse_scan_file(mm, addr, file, pgoff,
cc);
- fput(file);
- } else {
- result = hpage_collapse_scan_pmd(mm, vma, addr,
- &mmap_locked, cc);
- }
- if (!mmap_locked)
- *lock_dropped = true;
-
- /*
- * If the file-backed VMA has dirty pages, the scan triggers
- * async writeback and returns SCAN_PAGE_DIRTY_OR_WRITEBACK.
- * Since MADV_COLLAPSE is sync, we force sync writeback and
- * retry once.
- */
- if (result == SCAN_PAGE_DIRTY_OR_WRITEBACK && !retried) {
- /*
- * File scan drops the lock. We must re-acquire it to
- * safely inspect the VMA and hold the file reference.
- */
- if (!mmap_locked) {
- cond_resched();
- mmap_read_lock(mm);
- mmap_locked = true;
- result = hugepage_vma_revalidate(mm, addr, false, &vma, cc);
- if (result != SCAN_SUCCEED)
- goto handle_result;
- }
- if (!vma_is_anonymous(vma) && vma->vm_file &&
- mapping_can_writeback(vma->vm_file->f_mapping)) {
- struct file *file = get_file(vma->vm_file);
- pgoff_t pgoff = linear_page_index(vma, addr);
+ if (result == SCAN_PAGE_DIRTY_OR_WRITEBACK && !triggered_wb &&
+ mapping_can_writeback(file->f_mapping)) {
loff_t lstart = (loff_t)pgoff << PAGE_SHIFT;
loff_t lend = lstart + HPAGE_PMD_SIZE - 1;
- mmap_read_unlock(mm);
- mmap_locked = false;
- *lock_dropped = true;
filemap_write_and_wait_range(file->f_mapping, lstart, lend);
+ triggered_wb = true;
fput(file);
- retried = true;
goto retry;
}
+ fput(file);
+ } else {
+ result = hpage_collapse_scan_pmd(mm, vma, addr,
+ &mmap_locked, cc);
}
+ if (!mmap_locked)
+ *lock_dropped = true;
handle_result:
switch (result) {
_
Powered by blists - more mailing lists