linux-kernel - Re: [PATCH v3 14/15] mm, swap: implement dynamic allocation of swap table

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CACePvbUWGNq=TW2aP2zvcp9=Xt86hBPi3Ga1SLqjeZcyaKApoQ@mail.gmail.com>
Date: Mon, 15 Sep 2025 10:14:26 -0700
From: Chris Li <chrisl@...nel.org>
To: Chris Mason <clm@...a.com>
Cc: Kairui Song <ryncsn@...il.com>, linux-mm@...ck.org, 
	Andrew Morton <akpm@...ux-foundation.org>, Matthew Wilcox <willy@...radead.org>, 
	Hugh Dickins <hughd@...gle.com>, Barry Song <baohua@...nel.org>, Baoquan He <bhe@...hat.com>, 
	Nhat Pham <nphamcs@...il.com>, Kemeng Shi <shikemeng@...weicloud.com>, 
	Baolin Wang <baolin.wang@...ux.alibaba.com>, Ying Huang <ying.huang@...ux.alibaba.com>, 
	Johannes Weiner <hannes@...xchg.org>, David Hildenbrand <david@...hat.com>, 
	Yosry Ahmed <yosryahmed@...gle.com>, Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, 
	Zi Yan <ziy@...dia.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 14/15] mm, swap: implement dynamic allocation of swap table

On Mon, Sep 15, 2025 at 10:00 AM Chris Mason <clm@...a.com> wrote:
>
>
>
> On 9/15/25 12:24 PM, Kairui Song wrote:
> > On Mon, Sep 15, 2025 at 11:55 PM Chris Mason <clm@...a.com> wrote:
> >>
> >> On Thu, 11 Sep 2025 00:08:32 +0800 Kairui Song <ryncsn@...il.com> wrote:
>
> [ ... ]
>              spin_lock(&si->global_cluster_lock);
> >>> +     /*
> >>> +      * Back to atomic context. First, check if we migrated to a new
> >>> +      * CPU with a usable percpu cluster. If so, try using that instead.
> >>> +      * No need to check it for the spinning device, as swap is
> >>> +      * serialized by the global lock on them.
> >>> +      *
> >>> +      * The is_usable check is a bit rough, but ensures order 0 success.
> >>> +      */
> >>> +     offset = this_cpu_read(percpu_swap_cluster.offset[order]);
> >>> +     if ((si->flags & SWP_SOLIDSTATE) && offset) {
> >>> +             pcp_ci = swap_cluster_lock(si, offset);
> >>> +             if (cluster_is_usable(pcp_ci, order) &&
> >>> +                 pcp_ci->count < SWAPFILE_CLUSTER) {
> >>> +                     ci = pcp_ci;
> >>                        ^^^^^^^^^^^^^
> >> ci came from the caller, and in the case of isolate_lock_cluster() they
> >> had just removed it from a list.  We overwrite ci and return something
> >> different.
> >
> > Yes, that's expected. See the comment above. We have just dropped
> > local lock so it's possible that we migrated to another CPU which has
> > its own percpu cache ci (percpu_swap_cluster.offset).
> >
> > To avoid fragmentation, drop the isolated ci and use the percpu ci
> > instead. But you are right that I need to add the ci back to the list,
> > or it will be leaked. Thanks!
>
> Yeah, the comment helped a lot (thank you).  It was just the leak I was
> worried about ;)

As Kairui said, that is not a leak, it is the intended behavior. It
rotates the listhead when fetching the ci from the list to avoid
repeatedly trying some fragment cluster which has a very low success
rate. Otherwise it can stall on the same fragmented list. It does look
odd at the first glance. That is the best we can do so far without
introducing a lot of repeating rotation logic to the caller. If you
find other ways to improve the reading without performance penalty and
make code simpler, feel free to make suggestions or even patches.

Chris