lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9ac88791-54e2-4f7c-9171-5a1d44e01657@linux.dev>
Date: Fri, 26 Jul 2024 11:21:23 +0800
From: Chengming Zhou <chengming.zhou@...ux.dev>
To: Takero Funaki <flintglass@...il.com>, Nhat Pham <nphamcs@...il.com>
Cc: Johannes Weiner <hannes@...xchg.org>, Yosry Ahmed
 <yosryahmed@...gle.com>, Andrew Morton <akpm@...ux-foundation.org>,
 linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 2/2] mm: zswap: fix global shrinker error handling
 logic

On 2024/7/24 00:44, Takero Funaki wrote:
> 2024年7月23日(火) 6:51 Nhat Pham <nphamcs@...il.com>:
>>
>> On Fri, Jul 19, 2024 at 9:41 PM Takero Funaki <flintglass@...il.com> wrote:
>>>
>>> This patch fixes zswap global shrinker that did not shrink zpool as
>>> expected.
>>>
>>> The issue it addresses is that `shrink_worker()` did not distinguish
>>> between unexpected errors and expected error codes that should be
>>> skipped, such as when there is no stored page in a memcg. This led to
>>> the shrinking process being aborted on the expected error codes.
>>
>> The code itself seems reasonable to me, but may I ask you to document
>> (as a comment) all the expected v.s unexpected cases? i.e when do we
>> increment (or not increment) the failure counter?
>>
> 
> In addition to changes in the commit log suggested by Yosry,
> adding some comments specifying what memcg is (not) candidates for
> writeback, and what should be a failure.
> 
> -       /* global reclaim will select cgroup in a round-robin fashion.
> +       /*
> +        * Global reclaim will select cgroup in a round-robin fashion from all
> +        * online memcgs, but memcgs that have no pages in zswap and
> +        * writeback-disabled memcgs (memory.zswap.writeback=0) are not
> +        * candidates for shrinking.
> +        *
> +        * Shrinking will be aborted if we encounter the following
> +        * MAX_RECLAIM_RETRIES times:
> +        * - No writeback-candidate memcgs found in a memcg tree walk.
> +        * - Shrinking a writeback-candidate memcg failed.
>           *
>           * We save iteration cursor memcg into zswap_next_shrink,
>           * which can be modified by the offline memcg cleaner
> 
> and, the reasons to (not) increment the progress:
> 
> @@ -1387,10 +1407,20 @@ static void shrink_worker(struct work_struct *w)
>                  /* drop the extra reference */
>                  mem_cgroup_put(memcg);
> 
> -               if (ret == -EINVAL)
> -                       break;
> +               /*
> +                * There are no writeback-candidate pages in the memcg.
> +                * This is not an issue as long as we can find another memcg
> +                * with pages in zswap. Skip this without incrementing progress
> +                * and failures.
> +                */
> +               if (ret == -ENOENT)
> +                       continue;
> +
>                  if (ret && ++failures == MAX_RECLAIM_RETRIES)
>                          break;
> +
> +               /* completed writeback or incremented failures */
> +               ++progress;

Maybe the name "progress" is a little confusing here? "progress" sounds 
to me that we have some writeback completed.

But actually it just means we have encountered some candidates, right?

Thanks.


>   resched:
> 
> 
>> My understanding is, we only increment the failure counter if we fail
>> to reclaim from a selected memcg that is non-empty and
>> writeback-enabled, or if we go a full tree walk without making any
>> progress. Is this correct?
>>
> 
> Yes, that's the expected behavior.
> Please let me know if there is more appropriate wording.
> 
> Thanks.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ