linux-kernel - Re: [PATCH v2 12/28] mm: vmstat: use s32 for vm_node_stat_diff in struct per_cpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200203203450.GA6380@cmpxchg.org>
Date:   Mon, 3 Feb 2020 15:34:50 -0500
From:   Johannes Weiner <hannes@...xchg.org>
To:     Roman Gushchin <guro@...com>
Cc:     linux-mm@...ck.org, Andrew Morton <akpm@...ux-foundation.org>,
        Michal Hocko <mhocko@...nel.org>,
        Shakeel Butt <shakeelb@...gle.com>,
        Vladimir Davydov <vdavydov.dev@...il.com>,
        linux-kernel@...r.kernel.org, kernel-team@...com,
        Bharata B Rao <bharata@...ux.ibm.com>,
        Yafang Shao <laoar.shao@...il.com>
Subject: Re: [PATCH v2 12/28] mm: vmstat: use s32 for vm_node_stat_diff in
 struct per_cpu_nodestat

On Mon, Feb 03, 2020 at 10:25:06AM -0800, Roman Gushchin wrote:
> On Mon, Feb 03, 2020 at 12:58:18PM -0500, Johannes Weiner wrote:
> > On Mon, Jan 27, 2020 at 09:34:37AM -0800, Roman Gushchin wrote:
> > > Currently s8 type is used for per-cpu caching of per-node statistics.
> > > It works fine because the overfill threshold can't exceed 125.
> > > 
> > > But if some counters are in bytes (and the next commit in the series
> > > will convert slab counters to bytes), it's not gonna work:
> > > value in bytes can easily exceed s8 without exceeding the threshold
> > > converted to bytes. So to avoid overfilling per-cpu caches and breaking
> > > vmstats correctness, let's use s32 instead.
> > > 
> > > This doesn't affect per-zone statistics. There are no plans to use
> > > zone-level byte-sized counters, so no reasons to change anything.
> > 
> > Wait, is this still necessary? AFAIU, the node counters will account
> > full slab pages, including free space, and only the memcg counters
> > that track actual objects will be in bytes.
> > 
> > Can you please elaborate?
> 
> It's weird to have a counter with the same name (e.g. NR_SLAB_RECLAIMABLE_B)
> being in different units depending on the accounting scope.
> So I do convert all slab counters: global, per-lruvec,
> and per-memcg to bytes.

Since the node counters tracks allocated slab pages and the memcg
counter tracks allocated objects, arguably they shouldn't use the same
name anyway.

> Alternatively I can fork them, e.g. introduce per-memcg or per-lruvec
> NR_SLAB_RECLAIMABLE_OBJ
> NR_SLAB_UNRECLAIMABLE_OBJ

Can we alias them and reuse their slots?

	/* Reuse the node slab page counters item for charged objects */
	MEMCG_SLAB_RECLAIMABLE = NR_SLAB_RECLAIMABLE,
	MEMCG_SLAB_UNRECLAIMABLE = NR_SLAB_UNRECLAIMABLE,

> and keep global counters untouched. If going this way, I'd prefer to make
> them per-memcg, because it will simplify things on charging paths:
> now we do get task->mem_cgroup->obj_cgroup in the pre_alloc_hook(),
> and then obj_cgroup->mem_cgroup in the post_alloc_hook() just to
> bump per-lruvec counters.

I don't quite follow. Don't you still have to update the global
counters?

> Btw, I wonder if we really need per-lruvec counters at all (at least
> being enabled by default). For the significant amount of users who
> have a single-node machine it doesn't bring anything except performance
> overhead.

Yeah, for single-node systems we should be able to redirect everything
to the memcg counters, without allocating and tracking lruvec copies.

> For those who have multiple nodes (and most likely many many
> memory cgroups) it provides way too many data except for debugging
> some weird mm issues.
> I guess in the absolute majority of cases having global per-node + per-memcg
> counters will be enough.

Hm? Reclaim uses the lruvec counters.