linux-kernel - Re: [PATCH v1 1/3] mm: zswap: fix global shrinker memcg iteration

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAPpoddcQRJ5EYFVaUhM+f5pV-RMtq=1kzkTuWQ2GRkcux-Yzcg@mail.gmail.com>
Date: Fri, 14 Jun 2024 13:39:57 +0900
From: Takero Funaki <flintglass@...il.com>
To: Shakeel Butt <shakeel.butt@...ux.dev>
Cc: Nhat Pham <nphamcs@...il.com>, Yosry Ahmed <yosryahmed@...gle.com>, 
	Johannes Weiner <hannes@...xchg.org>, Chengming Zhou <chengming.zhou@...ux.dev>, 
	Jonathan Corbet <corbet@....net>, Andrew Morton <akpm@...ux-foundation.org>, 
	Domenico Cerasuolo <cerasuolodomenico@...il.com>, linux-mm@...ck.org, linux-doc@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v1 1/3] mm: zswap: fix global shrinker memcg iteration

2024年6月14日(金) 1:49 Shakeel Butt <shakeel.butt@...ux.dev>:
>
> On Thu, Jun 13, 2024 at 08:04:39AM GMT, Nhat Pham wrote:
> [...]
> > > > >
> > > > > Is the idea here to avoid moving the iterator to another offline memcg
> > > > > that zswap_memcg_offline_cleanup() was already called for, to avoid
> > > > > holding a ref on that memcg until the next run of zswap shrinking?
> > > > >
> > > > > If yes, I think it's probably worth doing. But why do we need to
> > > > > release and reacquire the lock in the loop above?
> > > >
> > > > Yes, the existing cleaner might leave the offline, already-cleaned memcg.
> > > >
> > > > The reacquiring lock is to not loop inside the critical section.
> > > > In shrink_worker of v0 patch, the loop was restarted on offline memcg
> > > > without releasing the lock. Nhat pointed out that we should drop the
> > > > lock after every mem_cgroup_iter() call. v1 was changed to reacquire
> > > > once per iteration like the cleaner code above.
> > >
> > > I am not sure how often we'll run into a situation where we'll be
> > > holding the lock for too long tbh. It should be unlikely to keep
> > > encountering offline memcgs for a long time.
> > >
> > > Nhat, do you think this could cause a problem in practice?
> >
> > I don't remember prescribing anything to be honest :) I think I was
> > just asking why can't we just drop the lock, then "continue;". This is
> > mostly for simplicity's sake.
> >
> > https://lore.kernel.org/linux-mm/CAKEwX=MwrRc43iM2050v5u-TPUK4Yn+a4G7+h6ieKhpQ7WtQ=A@mail.gmail.com/

I apologize for misinterpreting your comments. Removing release/reacquire.

> >
> > But I think as Takero pointed out, it would still skip over the memcg
> > that was (concurrently) updated to zswap_next_shrink by the memcg
> > offline callback.
>
> What's the issue with keep traversing until an online memcg is found?
> Something like the following:
>
>
>         spin_lock(&zswap_shrink_lock);
>         do {
>                 zswap_next_shrink = mem_cgroup_iter(NULL, zswap_next_shrink, NULL);
>         } while (zswap_next_shrink && !mem_cgroup_online(zswap_next_shrink));
>
>         if (!zswap_next_shrink)
>                 zswap_next_shrink = mem_cgroup_iter(NULL, NULL, NULL);
>         ....
>
> Is the concern that there can a lot of offlined memcgs which may cause
> need resched warnings?

To avoid using the goto-based loop, here's the new version, including
Shakeel's suggestion:
```c
    do {
        spin_lock(&zswap_shrink_lock);
        /*
         * Advance the cursor to start shrinking from the next memcg
         * after zswap_next_shrink.  One memcg might be skipped from
         * shrinking if the cleaner also advanced the cursor, but it
         * will happen at most once per offlining memcg.
         */
        do {
            zswap_next_shrink = mem_cgroup_iter(NULL,
                    zswap_next_shrink, NULL);
            memcg = zswap_next_shrink;
        } while (memcg && !mem_cgroup_tryget_online(memcg));

        if (!memcg) {
            spin_unlock(&zswap_shrink_lock);
```
We can add or remove  `spin_unlock();spin_lock();` just after
mem_cgroup_iter(), if needed.
I believe the behavior is identical to v1 except for the starting
point of iteration.

For Naht's comment, 2. No skipping over zswap_next_shrink updated by
the memcg offline cleaner.
While this was true for v1, I'm moved to accept this skipping as it's
negligibly rare. As Yorsy commented, v1 retried the last memcg from
the last shrink_worker() run.

There are several options for shrink_worker where to start with:
1. Starting from the next memcg after zswap_next_shrink: It might skip
one memcg, but this is quite rare. It is the current behavior before
patch.

2. Starting from zswap_next_shrink: It might shrink one more page from
the memcg in addition to the one by the last shrink_worker() run. This
should also be rare, but probably more frequent than option 1. This is
the v0 patch behavior.

3. Addressing both: Save the last memcg as well. The worker checks if
it has been modified by the cleaner and advances only if it hasn't.
Like this:
```c
        do {
            if (zswap_last_shrink == zswap_next_shrink) {
                zswap_next_shrink = mem_cgroup_iter(NULL,
                        zswap_next_shrink, NULL);
            }
            memcg = zswap_next_shrink;
        } while (memcg && !mem_cgroup_tryget_online(memcg));
        zswap_last_shrink = memcg;
```

Which one would be better? or any other idea?