linux-kernel - Re: [PATCH v1 1/3] mm: zswap: fix global shrinker memcg iteration

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKEwX=MiMMCrQCq2j1DDOR_U6==6pwbqqCnsaoigQ4aEqhgaaw@mail.gmail.com>
Date: Thu, 13 Jun 2024 09:08:05 -0700
From: Nhat Pham <nphamcs@...il.com>
To: Takero Funaki <flintglass@...il.com>
Cc: Johannes Weiner <hannes@...xchg.org>, Yosry Ahmed <yosryahmed@...gle.com>, 
	Chengming Zhou <chengming.zhou@...ux.dev>, Jonathan Corbet <corbet@....net>, 
	Andrew Morton <akpm@...ux-foundation.org>, 
	Domenico Cerasuolo <cerasuolodomenico@...il.com>, linux-mm@...ck.org, linux-doc@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v1 1/3] mm: zswap: fix global shrinker memcg iteration

On Sat, Jun 8, 2024 at 8:53 AM Takero Funaki <flintglass@...il.com> wrote:
>
> This patch fixes an issue where the zswap global shrinker stopped
> iterating through the memcg tree.
>
> The problem was that `shrink_worker()` would stop iterating when a memcg
> was being offlined and restart from the tree root.  Now, it properly
> handles the offlining memcg and continues shrinking with the next memcg.
>
> This patch also modified handing of the lock for offlined memcg cleaner
> to adapt the change in the iteration, and avoid negligibly rare skipping
> of a memcg from shrink iteration.
>

Honestly, I think this version is even more complicated than v0 :)

Hmmm how about this:

do {
    spin_lock(&zswap_pools_lock);
    do {
        /* no offline caller has been called to memcg yet */
        if (memcg == zswap_next_shrink) {
            zswap_next_shrink = mem_cgroup_iter(NULL, zswap_next_shrink, NULL);

        memcg = zswap_next_shrink;
        if (!memcg && ++failure > MAX_FAILURE) {
            spin_unlock(&zswap_pools_lock);
            return;
        }

    } while (!zswap_next_shrink || !mem_cgroup_tryget_online(zswap_next_shrink))
    spin_unlock(&zswap_pools_lock);

    /* proceed with reclaim */
} while (...)

This should achieve all the goals right?

1. No restarting from the top, unless we have traversed the entire hierarchy.

2. No skipping over zswap_next_shrink updated by the memcg offline cleaner.

3. No leaving offlined zswap_next_shrink hanging (and blocking memcg
killing indefinitely).

4. No long running loop unnecessarily. If you want to be extra safe,
we can put a max number of retries on the offline memcg case too (and
restart from the top).

5. At the end of the inner loop, you are guaranteed to hold at least
one reference to memcg (and perhaps 2, if zswap_next_shrink is not
updated by the cleaner).

5. At the end of the inner loop, the selected memcg is known to not
have its cleaner started yet (since it is online with zswap_pools_lock
held). Our selection will undo the cleaner and hold the memcg hostage
forever.

Is there anything that I'm missing? We are not dropping the lock for
the entirety of the inner loop, but honestly I'm fine with this trade
off, for the sake of simplicity :)

If you want it to be even more readable, feel free to refactor the
inner loop into a helper function (but it's probably redundant since
it's only used once).