linux-kernel - Re: [PATCH v3 2/2] mm: zswap: fix global shrinker error handling logic

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAPpodddd2SAVj3JmDHOz+xdaAc4nPT49_yHqnPCtcFSWqJk1=A@mail.gmail.com>
Date: Wed, 24 Jul 2024 01:44:44 +0900
From: Takero Funaki <flintglass@...il.com>
To: Nhat Pham <nphamcs@...il.com>
Cc: Johannes Weiner <hannes@...xchg.org>, Yosry Ahmed <yosryahmed@...gle.com>, 
	Chengming Zhou <chengming.zhou@...ux.dev>, Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 2/2] mm: zswap: fix global shrinker error handling logic

2024年7月23日(火) 6:51 Nhat Pham <nphamcs@...il.com>:
>
> On Fri, Jul 19, 2024 at 9:41 PM Takero Funaki <flintglass@...il.com> wrote:
> >
> > This patch fixes zswap global shrinker that did not shrink zpool as
> > expected.
> >
> > The issue it addresses is that `shrink_worker()` did not distinguish
> > between unexpected errors and expected error codes that should be
> > skipped, such as when there is no stored page in a memcg. This led to
> > the shrinking process being aborted on the expected error codes.
>
> The code itself seems reasonable to me, but may I ask you to document
> (as a comment) all the expected v.s unexpected cases? i.e when do we
> increment (or not increment) the failure counter?
>

In addition to changes in the commit log suggested by Yosry,
adding some comments specifying what memcg is (not) candidates for
writeback, and what should be a failure.

-       /* global reclaim will select cgroup in a round-robin fashion.
+       /*
+        * Global reclaim will select cgroup in a round-robin fashion from all
+        * online memcgs, but memcgs that have no pages in zswap and
+        * writeback-disabled memcgs (memory.zswap.writeback=0) are not
+        * candidates for shrinking.
+        *
+        * Shrinking will be aborted if we encounter the following
+        * MAX_RECLAIM_RETRIES times:
+        * - No writeback-candidate memcgs found in a memcg tree walk.
+        * - Shrinking a writeback-candidate memcg failed.
         *
         * We save iteration cursor memcg into zswap_next_shrink,
         * which can be modified by the offline memcg cleaner

and, the reasons to (not) increment the progress:

@@ -1387,10 +1407,20 @@ static void shrink_worker(struct work_struct *w)
                /* drop the extra reference */
                mem_cgroup_put(memcg);

-               if (ret == -EINVAL)
-                       break;
+               /*
+                * There are no writeback-candidate pages in the memcg.
+                * This is not an issue as long as we can find another memcg
+                * with pages in zswap. Skip this without incrementing progress
+                * and failures.
+                */
+               if (ret == -ENOENT)
+                       continue;
+
                if (ret && ++failures == MAX_RECLAIM_RETRIES)
                        break;
+
+               /* completed writeback or incremented failures */
+               ++progress;
 resched:


> My understanding is, we only increment the failure counter if we fail
> to reclaim from a selected memcg that is non-empty and
> writeback-enabled, or if we go a full tree walk without making any
> progress. Is this correct?
>

Yes, that's the expected behavior.
Please let me know if there is more appropriate wording.

Thanks.