[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <gqarnsvvanhk3yet472w2ihv2hwriviv3jpu4fpb24nfkd2f2e@cfh4ugd7xqk5>
Date: Wed, 14 Aug 2024 17:29:39 -0700
From: Shakeel Butt <shakeel.butt@...ux.dev>
To: Yosry Ahmed <yosryahmed@...gle.com>
Cc: Nhat Pham <nphamcs@...il.com>,
Jesper Dangaard Brouer <hawk@...nel.org>, Andrew Morton <akpm@...ux-foundation.org>,
Johannes Weiner <hannes@...xchg.org>, Michal Hocko <mhocko@...nel.org>,
Roman Gushchin <roman.gushchin@...ux.dev>, Muchun Song <muchun.song@...ux.dev>, Yu Zhao <yuzhao@...gle.com>,
linux-mm@...ck.org, linux-kernel@...r.kernel.org,
Meta kernel team <kernel-team@...a.com>, cgroups@...r.kernel.org
Subject: Re: [PATCH v2] memcg: use ratelimited stats flush in the reclaim
On Wed, Aug 14, 2024 at 04:48:42PM GMT, Yosry Ahmed wrote:
> On Wed, Aug 14, 2024 at 4:42 PM Shakeel Butt <shakeel.butt@...ux.dev> wrote:
> >
> > On Wed, Aug 14, 2024 at 04:03:13PM GMT, Nhat Pham wrote:
> > > On Wed, Aug 14, 2024 at 9:32 AM Shakeel Butt <shakeel.butt@...ux.dev> wrote:
> > > >
> > > >
> > > > Ccing Nhat
> > > >
> > > > On Wed, Aug 14, 2024 at 02:57:38PM GMT, Jesper Dangaard Brouer wrote:
> > > > > I suspect the next whac-a-mole will be the rstat flush for the slab code
> > > > > that kswapd also activates via shrink_slab, that via
> > > > > shrinker->count_objects() invoke count_shadow_nodes().
> > > > >
> > > >
> > > > Actually count_shadow_nodes() is already using ratelimited version.
> > > > However zswap_shrinker_count() is still using the sync version. Nhat is
> > > > modifying this code at the moment and we can ask if we really need most
> > > > accurate values for MEMCG_ZSWAP_B and MEMCG_ZSWAPPED for the zswap
> > > > writeback heuristic.
> > >
> > > You are referring to this, correct:
> > >
> > > mem_cgroup_flush_stats(memcg);
> > > nr_backing = memcg_page_state(memcg, MEMCG_ZSWAP_B) >> PAGE_SHIFT;
> > > nr_stored = memcg_page_state(memcg, MEMCG_ZSWAPPED);
> > >
> > > It's already a bit less-than-accurate - as you pointed out in another
> > > discussion, it takes into account the objects and sizes of the entire
> > > subtree, rather than just the ones charged to the current (memcg,
> > > node) combo. Feel free to optimize this away!
> > >
> > > In fact, I should probably replace this with another (atomic?) counter
> > > in zswap_lruvec_state struct, which tracks the post-compression size.
> > > That way, we'll have a better estimate of the compression factor -
> > > total post-compression size / (length of LRU * page size), and
> > > perhaps avoid the whole stat flushing path altogether...
> > >
> >
> > That sounds like much better solution than relying on rstat for accurate
> > stats.
>
> We can also use such atomic counters in obj_cgroup_may_zswap() and
> eliminate the rstat flush there as well. Same for zswap_current_read()
> probably.
>
> Most in-kernel flushers really only need a few stats, so I am
> wondering if it's better to incrementally move these ones outside of
> the rstat framework and completely eliminate in-kernel flushers. For
> instance, MGLRU does not require the flush that reclaim does as
> Shakeel pointed out.
>
> This will solve so many scalability problems that all of us have
> observed at some point or another and tried to optimize. I believe
> using rstat for userspace reads was the original intention anyway.
I like this direction and I think zswap would be a good first target.
Powered by blists - more mailing lists