[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200203203450.GA6380@cmpxchg.org>
Date: Mon, 3 Feb 2020 15:34:50 -0500
From: Johannes Weiner <hannes@...xchg.org>
To: Roman Gushchin <guro@...com>
Cc: linux-mm@...ck.org, Andrew Morton <akpm@...ux-foundation.org>,
Michal Hocko <mhocko@...nel.org>,
Shakeel Butt <shakeelb@...gle.com>,
Vladimir Davydov <vdavydov.dev@...il.com>,
linux-kernel@...r.kernel.org, kernel-team@...com,
Bharata B Rao <bharata@...ux.ibm.com>,
Yafang Shao <laoar.shao@...il.com>
Subject: Re: [PATCH v2 12/28] mm: vmstat: use s32 for vm_node_stat_diff in
struct per_cpu_nodestat
On Mon, Feb 03, 2020 at 10:25:06AM -0800, Roman Gushchin wrote:
> On Mon, Feb 03, 2020 at 12:58:18PM -0500, Johannes Weiner wrote:
> > On Mon, Jan 27, 2020 at 09:34:37AM -0800, Roman Gushchin wrote:
> > > Currently s8 type is used for per-cpu caching of per-node statistics.
> > > It works fine because the overfill threshold can't exceed 125.
> > >
> > > But if some counters are in bytes (and the next commit in the series
> > > will convert slab counters to bytes), it's not gonna work:
> > > value in bytes can easily exceed s8 without exceeding the threshold
> > > converted to bytes. So to avoid overfilling per-cpu caches and breaking
> > > vmstats correctness, let's use s32 instead.
> > >
> > > This doesn't affect per-zone statistics. There are no plans to use
> > > zone-level byte-sized counters, so no reasons to change anything.
> >
> > Wait, is this still necessary? AFAIU, the node counters will account
> > full slab pages, including free space, and only the memcg counters
> > that track actual objects will be in bytes.
> >
> > Can you please elaborate?
>
> It's weird to have a counter with the same name (e.g. NR_SLAB_RECLAIMABLE_B)
> being in different units depending on the accounting scope.
> So I do convert all slab counters: global, per-lruvec,
> and per-memcg to bytes.
Since the node counters tracks allocated slab pages and the memcg
counter tracks allocated objects, arguably they shouldn't use the same
name anyway.
> Alternatively I can fork them, e.g. introduce per-memcg or per-lruvec
> NR_SLAB_RECLAIMABLE_OBJ
> NR_SLAB_UNRECLAIMABLE_OBJ
Can we alias them and reuse their slots?
/* Reuse the node slab page counters item for charged objects */
MEMCG_SLAB_RECLAIMABLE = NR_SLAB_RECLAIMABLE,
MEMCG_SLAB_UNRECLAIMABLE = NR_SLAB_UNRECLAIMABLE,
> and keep global counters untouched. If going this way, I'd prefer to make
> them per-memcg, because it will simplify things on charging paths:
> now we do get task->mem_cgroup->obj_cgroup in the pre_alloc_hook(),
> and then obj_cgroup->mem_cgroup in the post_alloc_hook() just to
> bump per-lruvec counters.
I don't quite follow. Don't you still have to update the global
counters?
> Btw, I wonder if we really need per-lruvec counters at all (at least
> being enabled by default). For the significant amount of users who
> have a single-node machine it doesn't bring anything except performance
> overhead.
Yeah, for single-node systems we should be able to redirect everything
to the memcg counters, without allocating and tracking lruvec copies.
> For those who have multiple nodes (and most likely many many
> memory cgroups) it provides way too many data except for debugging
> some weird mm issues.
> I guess in the absolute majority of cases having global per-node + per-memcg
> counters will be enough.
Hm? Reclaim uses the lruvec counters.
Powered by blists - more mailing lists