linux-kernel - Re: global_reclaim() and code documentation (was: Re: [PATCH v3 3/3] mm: vmscan: ignore non-LRU-based reclaim in memcg reclaim

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230406200213.GA50192@cmpxchg.org>
Date:   Thu, 6 Apr 2023 16:02:13 -0400
From:   Johannes Weiner <hannes@...xchg.org>
To:     Yu Zhao <yuzhao@...gle.com>
Cc:     Yosry Ahmed <yosryahmed@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        Shakeel Butt <shakeelb@...gle.com>,
        Michal Hocko <mhocko@...e.com>
Subject: Re: global_reclaim() and code documentation (was: Re: [PATCH v3 3/3]
 mm: vmscan: ignore non-LRU-based reclaim in memcg reclaim

On Wed, Apr 05, 2023 at 03:09:27PM -0600, Yu Zhao wrote:
> On Wed, Apr 5, 2023 at 2:01 PM Johannes Weiner <hannes@...xchg.org> wrote:
> > static bool cgroup_reclaim(struct scan_control *sc)
> > {
> >         return sc->target_mem_cgroup;
> > }
> >
> > static bool global_reclaim(struct scan_control *sc)
> > {
> >         return !sc->target_mem_cgroup || mem_cgroup_is_root(sc->target_mem_cgroup);
> > }
> >
> > The name suggests it's the same thing twice, with opposite
> > polarity. But of course they're subtly different, and not documented.
> >
> > When do you use which?
> 
> The problem I saw is that target_mem_cgroup is set when writing to the
> root memory.reclaim. And for this case, a few places might prefer
> global_reclaim(), e.g., in shrink_lruvec(), in addition to where it's
> being used.
>
> If this makes sense, we could 1) document it (or rename it) and apply
> it to those places, or 2) just unset target_mem_cgroup for root and
> delete global_reclaim(). Option 2 might break ABI but still be
> acceptable.

Ah, cgroup_reclaim() tests whether it's limit/proactive reclaim or
allocator reclaim. global_reclaim() tests whether it's root reclaim
(which could be from either after memory.reclaim).

I suppose we didn't clarify when introducing memory.reclaim what the
semantics should be on the root cgroup:

- We currently exclude PGSCAN and PGSTEAL stats from /proc/vmstat for
  limit reclaim to tell cgroup constraints from physical pressure. We
  currently exclude root memory.reclaim activity as well. Seems okay.

- The file_is_tiny heuristic is for allocator reclaim near OOM. It's
  currently excluded for root memory.reclaim, which seems okay too.

- Like limit reclaim, root memory.reclaim currently NEVER swaps when
  global swappiness is 0. The whole cgroup-specific swappiness=0
  semantic is kind of quirky. But I suppose we can keep it as-is.

- Proportional reclaim is disabled for everybody but direct reclaim
  from the allocator at initial priority. Effectively disabled for all
  situations where it might matter, including root memory.reclaim. We
  should also keep this as-is.

- Writeback throttling is interesting. Historically, kswapd set the
  congestion state when running into lots of PG_reclaim pages, and
  clear it when the node is balanced. This throttles direct reclaim.

  Cgroup limit reclaim would set and clear congestion on non-root only
  to do local limit-reclaim throttling. But now root memory.reclaim
  might clear a bit set by kswapd before the node is balanced, and
  release direct reclaimers from throttling prematurely. This seems
  wrong. However, the alternative is throttling memory.reclaim on
  subgroup congestion but not root, which seems also wrong.

- Root memory.reclaim is exempted from the compaction_ready() bail
  condition as well as soft limit reclaim. But they'd both be no-ops
  anyway if we changed cgroup_reclaim() semantics.

IMO we should either figure out what we want to do in those cases
above, at least for writeback throttling.

Are you guys using root-level proactive reclaim?

> If we don't want to decide right now, I can rename global_reclaim() to
> root_reclaim() and move it within #ifdef CONFIG_LRU_GEN and probably
> come back and revisit later.

So conventional vmscan treats root-level memory.reclaim the same as
any other cgroup reclaim. And the cgroup_reclaim() checks are mostly
reserved for (questionable) allocator reclaim-specific heuristics.

Is there a chance lrugen could do the same, and you'd be fine with
using cgroup_reclaim()? Or is it a data structure problem?

If so, I agree it could be better to move it to CONFIG_LRU_GEN and
rename it for the time being. root_reclaim() SGTM.