linux-kernel - Re: [PATCH v3 14/15] mm, swap: implement dynamic allocation of swap table

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CACePvbWiGZEuR1xHorjS2mXE-=Z4ZfpR8U_1jSMGMBe8PFnU_g@mail.gmail.com>
Date: Mon, 15 Sep 2025 11:03:01 -0700
From: Chris Li <chrisl@...nel.org>
To: Chris Mason <clm@...a.com>
Cc: Kairui Song <ryncsn@...il.com>, linux-mm@...ck.org, 
	Andrew Morton <akpm@...ux-foundation.org>, Matthew Wilcox <willy@...radead.org>, 
	Hugh Dickins <hughd@...gle.com>, Barry Song <baohua@...nel.org>, Baoquan He <bhe@...hat.com>, 
	Nhat Pham <nphamcs@...il.com>, Kemeng Shi <shikemeng@...weicloud.com>, 
	Baolin Wang <baolin.wang@...ux.alibaba.com>, Ying Huang <ying.huang@...ux.alibaba.com>, 
	Johannes Weiner <hannes@...xchg.org>, David Hildenbrand <david@...hat.com>, 
	Yosry Ahmed <yosryahmed@...gle.com>, Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, 
	Zi Yan <ziy@...dia.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 14/15] mm, swap: implement dynamic allocation of swap table

On Mon, Sep 15, 2025 at 10:14 AM Chris Li <chrisl@...nel.org> wrote:
>
> On Mon, Sep 15, 2025 at 10:00 AM Chris Mason <clm@...a.com> wrote:
> >
> >
> >
> > On 9/15/25 12:24 PM, Kairui Song wrote:
> > > On Mon, Sep 15, 2025 at 11:55 PM Chris Mason <clm@...a.com> wrote:
> > >>
> > >> On Thu, 11 Sep 2025 00:08:32 +0800 Kairui Song <ryncsn@...il.com> wrote:
> >
> > [ ... ]
> >              spin_lock(&si->global_cluster_lock);
> > >>> +     /*
> > >>> +      * Back to atomic context. First, check if we migrated to a new
> > >>> +      * CPU with a usable percpu cluster. If so, try using that instead.
> > >>> +      * No need to check it for the spinning device, as swap is
> > >>> +      * serialized by the global lock on them.
> > >>> +      *
> > >>> +      * The is_usable check is a bit rough, but ensures order 0 success.
> > >>> +      */
> > >>> +     offset = this_cpu_read(percpu_swap_cluster.offset[order]);
> > >>> +     if ((si->flags & SWP_SOLIDSTATE) && offset) {
> > >>> +             pcp_ci = swap_cluster_lock(si, offset);
> > >>> +             if (cluster_is_usable(pcp_ci, order) &&
> > >>> +                 pcp_ci->count < SWAPFILE_CLUSTER) {
> > >>> +                     ci = pcp_ci;
> > >>                        ^^^^^^^^^^^^^
> > >> ci came from the caller, and in the case of isolate_lock_cluster() they
> > >> had just removed it from a list.  We overwrite ci and return something
> > >> different.
> > >
> > > Yes, that's expected. See the comment above. We have just dropped
> > > local lock so it's possible that we migrated to another CPU which has
> > > its own percpu cache ci (percpu_swap_cluster.offset).
> > >
> > > To avoid fragmentation, drop the isolated ci and use the percpu ci
> > > instead. But you are right that I need to add the ci back to the list,
> > > or it will be leaked. Thanks!
> >
> > Yeah, the comment helped a lot (thank you).  It was just the leak I was
> > worried about ;)
>
> As Kairui said, that is not a leak, it is the intended behavior. It
> rotates the listhead when fetching the ci from the list to avoid
> repeatedly trying some fragment cluster which has a very low success
> rate. Otherwise it can stall on the same fragmented list. It does look
> odd at the first glance. That is the best we can do so far without
> introducing a lot of repeating rotation logic to the caller. If you
> find other ways to improve the reading without performance penalty and
> make code simpler, feel free to make suggestions or even patches.

Sorry I take back what I just said. There might be a real leak as you
point out. My bad.

Chris