linux-kernel - Re: [PATCH v2 1/2] mm: memcg: refactor page state unit helpers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAJD7tkbaTRu838U=e_A+89PY1t4K+t_G1qkYq84BSDO7wAEtEg@mail.gmail.com>
Date:   Thu, 5 Oct 2023 02:31:03 -0700
From:   Yosry Ahmed <yosryahmed@...gle.com>
To:     Michal Koutný <mkoutny@...e.com>
Cc:     Johannes Weiner <hannes@...xchg.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Shakeel Butt <shakeelb@...gle.com>,
        Michal Hocko <mhocko@...nel.org>,
        Roman Gushchin <roman.gushchin@...ux.dev>,
        Muchun Song <muchun.song@...ux.dev>, linux-mm@...ck.org,
        cgroups@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 1/2] mm: memcg: refactor page state unit helpers

On Thu, Oct 5, 2023 at 2:06 AM Michal Koutný <mkoutny@...e.com> wrote:
>
> On Wed, Oct 04, 2023 at 02:36:19PM -0400, Johannes Weiner <hannes@...xchg.org> wrote:
> > Yes, it's because of node resolution which event counters generally
> > don't have. Some of the refault events influence node-local reclaim
> > decisions, see mm/vmscan.c::snapshot_refaults().
> >
> > There are a few other event counters in the stat array that people
> > thought would be useful to have split out in
> > /sys/devices/system/node/nodeN/vmstat to understand numa behavior
> > better.
> >
> > It's a bit messy.
> >
> > Some events would be useful to move to 'stats' for the numa awareness,
> > such as the allocator stats and reclaim activity.
> >
> > Some events would be useful to move to 'stats' for the numa awareness,
> > but don't have the zone resolution required by them, such as
> > kswapd/kcompactd wakeups.
>
> Thanks for the enlightenment.
>
> > Some events aren't numa specific, such as oom kills, drop_pagecache.
>
> These are oddballs indeed. As with the normalization patchset these are
> counted as PAGE_SIZE^W 1 error but they should rather be an infinite
> error (to warrant a flush).
>
> So my feedback to this series is:
> - patch 1/2 -- creating two classes of units is consequence of unclarity
>   between state and events (as in event=Δstate/Δt) and resolution
>   (global vs per-node), so the better approach would be to tidy this up,

I am not really sure what you mean here. I understand that this series
fixes the unit normalization for state but leaves events out of it.
Looking at the event items tracked by memcg in memcg_vm_event_stat it
looks to me that most of them correspond roughly to a page's worth of
updates (all but the THP_* events). We don't track things like
OOM_KILL and DROP_PAGECACHE per memcg as far as I can tell.

Do you mean that we should add something similar to
memcg_page_state_unit() for events as well to get all of them right?
If yes, I think that should be easy to add, it would only special case
THP_* events.

Alternatively, I can add a comment above the call to
memcg_rstat_updated() in __count_memcg_events() explaining why we
don't normalize the event count for now.

> - patch 2/2 -- it could use the single unit class that exists,
>   it'll bound the error of printed numbers afterall (and can be changed
>   later depending on how it affects internal consumers).

This will mean that WORKINGSET_* state will become more stale. We will
need 4096 as many updates as today to get a flush. These are used by
internal flushers (reclaim), and are exposed to userspace. I am not
sure we want to do that.

>
> My 0.02€,
> Michal