linux-kernel - Re: [PATCH 1/1] Restore swap

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMgjq7CrKv7QJNFO_vg1_nXA+RSW83mu8X4cMSVd2tfMA==Wew@mail.gmail.com>
Date: Thu, 15 Jan 2026 16:04:58 +0800
From: Kairui Song <ryncsn@...il.com>
To: robin.kuo@...iatek.com
Cc: Andrew Morton <akpm@...ux-foundation.org>, Chris Li <chrisl@...nel.org>, 
	Kemeng Shi <shikemeng@...weicloud.com>, Nhat Pham <nphamcs@...il.com>, 
	Baoquan He <bhe@...hat.com>, Barry Song <baohua@...nel.org>, 
	Matthias Brugger <matthias.bgg@...il.com>, 
	AngeloGioacchino Del Regno <angelogioacchino.delregno@...labora.com>, wsd_upstream@...iatek.com, 
	casper.li@...iatek.com, chinwen.chang@...iatek.com, Andrew.Yang@...iatek.com, 
	Qun-wei.Lin@...iatek.com, oliver.sang@...el.com, linux-mm@...ck.org, 
	linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org, 
	linux-mediatek@...ts.infradead.org
Subject: Re: [PATCH 1/1] Restore swap_space attr aviod krn panic

On Thu, Jan 15, 2026 at 8:14 AM <robin.kuo@...iatek.com> wrote:
>
> From: "robin.kuo" <robin.kuo@...iatek.com>
>
> Restore swap_space attr avoid krn panic
>
> Commit 8b47299a411a ('mm, swap: mark swap address space ro and add
> context debug check') made the swap address space read-only.
> It may lead to kernel panic if arch_prepare_to_swap returns a failure
> under heavy memory pressure as follows,
>
> el1_abort+0x40/0x64
> el1h_64_sync_handler+0x48/0xcc
> el1h_64_sync+0x84/0x88
> errseq_set+0x4c/0xb8 (P)
> __filemap_set_wb_err+0x20/0xd0
> shrink_folio_list+0xc20/0x11cc
> evict_folios+0x1520/0x1be4
> try_to_shrink_lruvec+0x27c/0x3dc
> shrink_one+0x9c/0x228
> shrink_node+0xb3c/0xeac
> do_try_to_free_pages+0x170/0x4f0
> try_to_free_pages+0x334/0x534
> __alloc_pages_direct_reclaim+0x90/0x158
> __alloc_pages_slowpath+0x334/0x588
> __alloc_frozen_pages_noprof+0x224/0x2fc
> __folio_alloc_noprof+0x14/0x64
> vma_alloc_zeroed_movable_folio+0x34/0x44
> do_pte_missing+0xad4/0x1040
> handle_mm_fault+0x4a4/0x790
> do_page_fault+0x288/0x5f8
> do_translation_fault+0x38/0x54
> do_mem_abort+0x54/0xa8
>
> Restore swap address space as not ro to avoid the panic.

Hi Robin

Thanks very much for the patch! I'm fine with it with some suggestions:

Marking it as RO was intentional to avoid silent unexpected usages of
the swap space, the swap space is hollow now, as all content is now in
the swap table.

For example the one you posted, calling filemap_set_wb_err for the
swap space is meaningless, the error info will be ignored without any
notice.
In the comment for mapping_set_error (wrapper for filemap_set_wb_err):
 * When a writeback error occurs, most filesystems will want to call
 * mapping_set_error to record the error in the mapping so that it can be
 * reported when the application calls fsync(2).

So far there is no fsync for swap or other ways to report mapping
errors to userspace. For example, __end_swap_bio_write will just print
an error log if writetback failed. I went through the code, currently
only arch_prepare_to_swap will cause a call to filemap_set_wb_err.
Swap is not a persistent storage like normal FS, so simply bouncing
the writeback folio back as active is good enough for almost all
cases, like the one in end_swap_bio_write.

Catching potential issues with RO/panic is overkill indeed. So I'm
fine with dropping the RO by default. But perhaps we can keep the RO
attribute at least for VM_DEBUG build? Like this:

diff --git a/mm/swap.h b/mm/swap.h
index eba4c04ad039..307cc71d3635 100644
--- a/mm/swap.h
+++ b/mm/swap.h
@@ -220,7 +220,12 @@ int swap_writeout(struct folio *folio, struct
swap_iocb **swap_plug);
 void __swap_writepage(struct folio *folio, struct swap_iocb **swap_plug);

 /* linux/mm/swap_state.c */
-extern struct address_space swap_space __ro_after_init;
+#ifdef CONFIG_DEBUG_VM
+#define __swap_space_attr __ro_after_init
+#else
+#define __swap_space_attr __read_mostly
+#endif
+extern struct address_space swap_space __swap_space_attr;
 static inline struct address_space *swap_address_space(swp_entry_t entry)
 {
        return &swap_space;
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 3916d144eddc..11273e0ec811 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -38,7 +38,7 @@ static const struct address_space_operations swap_aops = {
 };

 /* Set swap_space as read only as swap cache is handled by swap table */
-struct address_space swap_space __ro_after_init = {
+struct address_space swap_space __swap_space_attr = {
        .a_ops = &swap_aops,
 };

---

And as for this particular error marking issue you posted, maybe we
should just drop the handle_write_error I think. That helper was
useful when the filesystems are still using the .writepage interface,
but now all filesystem have switched to .writepages and only swap may
call that helper, which is no longer helpful, as the error info is
completely ignored with no report back to userspace. Maybe we need
something like this:

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 39f0336d9c29..62fe433aac0b 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -477,27 +477,6 @@ static int reclaimer_offset(struct scan_control *sc)
        return PGSTEAL_DIRECT - PGSTEAL_KSWAPD;
 }

-/*
- * We detected a synchronous write error writing a folio out.  Probably
- * -ENOSPC.  We need to propagate that into the address_space for a subsequent
- * fsync(), msync() or close().
- *
- * The tricky part is that after writepage we cannot touch the mapping: nothing
- * prevents it from being freed up.  But we have a ref on the folio and once
- * that folio is locked, the mapping is pinned.
- *
- * We're allowed to run sleeping folio_lock() here because we know
the caller has
- * __GFP_FS.
- */
-static void handle_write_error(struct address_space *mapping,
-                               struct folio *folio, int error)
-{
-       folio_lock(folio);
-       if (folio_mapping(folio) == mapping)
-               mapping_set_error(mapping, error);
-       folio_unlock(folio);
-}
-
 static bool skip_throttle_noprogress(pg_data_t *pgdat)
 {
        int reclaimable = 0, write_pending = 0;
@@ -650,9 +629,10 @@ static pageout_t writeout(struct folio *folio,
struct address_space *mapping,
        else
                res = swap_writeout(folio, plug);

-       if (res < 0)
-               handle_write_error(mapping, folio, res);
-       if (res == AOP_WRITEPAGE_ACTIVATE) {
+       if (res < 0) {
+               pr_alert_ratelimited("Swap failure of folio (order:
%d, error: %d)\n",
+                                    folio_order(folio), res);
+       } else if (res == AOP_WRITEPAGE_ACTIVATE) {
                folio_clear_reclaim(folio);
                return PAGE_ACTIVATE;
        }