lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMgjq7CADKhU8r0xg+=xKJ20JybFbXc8mkBNYxaHsp3ZvYfV7g@mail.gmail.com>
Date: Wed, 3 Sep 2025 00:57:37 +0800
From: Kairui Song <ryncsn@...il.com>
To: Chris Li <chrisl@...nel.org>
Cc: Barry Song <21cnbao@...il.com>, linux-mm@...ck.org, 
	Andrew Morton <akpm@...ux-foundation.org>, Matthew Wilcox <willy@...radead.org>, 
	Hugh Dickins <hughd@...gle.com>, Baoquan He <bhe@...hat.com>, Nhat Pham <nphamcs@...il.com>, 
	Kemeng Shi <shikemeng@...weicloud.com>, Baolin Wang <baolin.wang@...ux.alibaba.com>, 
	Ying Huang <ying.huang@...ux.alibaba.com>, Johannes Weiner <hannes@...xchg.org>, 
	David Hildenbrand <david@...hat.com>, Yosry Ahmed <yosryahmed@...gle.com>, 
	Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, Zi Yan <ziy@...dia.com>, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 8/9] mm, swap: implement dynamic allocation of swap table

On Tue, Sep 2, 2025 at 9:20 PM Chris Li <chrisl@...nel.org> wrote:
>
> On Tue, Sep 2, 2025 at 4:15 AM Barry Song <21cnbao@...il.com> wrote:
> >
> > On Sat, Aug 23, 2025 at 3:21 AM Kairui Song <ryncsn@...il.com> wrote:
> > >
> > > From: Kairui Song <kasong@...cent.com>
> > >
> > > Now swap table is cluster based, which means free clusters can free its
> > > table since no one should modify it.
> > >
> > > There could be speculative readers, like swap cache look up, protect
> > > them by making them RCU safe. All swap table should be filled with null
> > > entries before free, so such readers will either see a NULL pointer or
> > > a null filled table being lazy freed.
> > >
> > > On allocation, allocate the table when a cluster is used by any order.
> > >
> >
> > Might be a silly question.
> >
> > Just curious—what happens if the allocation fails? Does the swap-out
> > operation also fail? We sometimes encounter strange issues when memory is
> > very limited, especially if the reclamation path itself needs to allocate
> > memory.
> >
> > Assume a case where we want to swap out a folio using clusterN. We then
> > attempt to swap out the following folios with the same clusterN. But if
> > the allocation of the swap_table keeps failing, what will happen?
>
> I think this is the same behavior as the XArray allocation node with no memory.
> The swap allocator will fail to isolate this cluster, it gets a NULL
> ci pointer as return value. The swap allocator will try other cluster
> lists, e.g. non_full, fragment etc.
> If all of them fail, the folio_alloc_swap() will return -ENOMEM. Which
> will propagate back to the try to swap out, then the shrink folio
> list. It will put this page back to the LRU.
>
> The shrink folio list either free enough memory (happy path) or not
> able to free enough memory and it will cause an OOM kill.
>
> I believe previously XArray will also return -ENOMEM at insert a
> pointer and not be able to allocate a node to hold that ponter. It has
> the same error poperation path. We did not change that.

Yes, exactly. The overall behaviour is the same.

The allocation is only needed when a CPU's local swap cluster is
drained and swap allocator needs a new cluster. But after the previous
patch [1], many swap devices will prefer nonfull list. So the chance
that we need a swap table allocation is lower.

If it failed to allocate a swap table for a new cluster, it will try
fallback to frag / reclaim full. Only if all lists are drained,
folio_alloc_swap may fail with -ENOMEM and the caller (lru shink)
either try reclaim some other page or fail with OOM.

I think the fallback of nonfull / free / frag / reclaim-full might
even be helpful to avoid swapout failure when under heavy pressure. I
don't have data for that though, but I did run many test with heavy
pressure and didn't seen any issue.

Link: https://lore.kernel.org/linux-mm/20250812-swap-scan-list-v3-0-6d73504d267b@kernel.org/
[1]
>
> Chris
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ