linux-kernel - Re: [PATCH v2 1/2] mm: memcg: refactor page state unit helpers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAJD7tkZaWVhedgVAU+6WUk08V5AW=fmtken5rZJyQm+JhoDs9w@mail.gmail.com>
Date:   Thu, 5 Oct 2023 10:30:36 -0700
From:   Yosry Ahmed <yosryahmed@...gle.com>
To:     Michal Koutný <mkoutny@...e.com>
Cc:     Johannes Weiner <hannes@...xchg.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Shakeel Butt <shakeelb@...gle.com>,
        Michal Hocko <mhocko@...nel.org>,
        Roman Gushchin <roman.gushchin@...ux.dev>,
        Muchun Song <muchun.song@...ux.dev>, linux-mm@...ck.org,
        cgroups@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 1/2] mm: memcg: refactor page state unit helpers

On Thu, Oct 5, 2023 at 9:30 AM Michal Koutný <mkoutny@...e.com> wrote:
>
> On Thu, Oct 05, 2023 at 02:31:03AM -0700, Yosry Ahmed <yosryahmed@...gle.com> wrote:
> > I am not really sure what you mean here.
>
> My "vision" is to treat WORKINGSET_ entries as events.
> That would mean implementing per-node tracking for vm_event_item
> (costlier?).
> That would mean node_stat_item and vm_event_item being effectively
> equal, so they could be merged in one.
> That would be situation to come up with new classification based on use
> cases (e.g. precision/timeliness requirements, state vs change
> semantics).
>
> (Do not take this as blocker of the patch 1/2, I rather used the
> opportunity to discuss a greater possible cleanup.)

Yeah ideally we can clean this up separately. I would be careful about
userspace exposure though. It seems like CONFIG_VM_EVENT_COUNTERS is
used to control tracking events and displaying them in vmstat, so
moving items between node_stat_item and vm_event_item (or merging
them) won't be easy.

>
> > We don't track things like OOM_KILL and DROP_PAGECACHE per memcg as
> > far as I can tell.
>
> Ah, good. (I forgot only subset of entries is relevant for memcgs.)
>
> > This will mean that WORKINGSET_* state will become more stale. We will
> > need 4096 as many updates as today to get a flush. These are used by
> > internal flushers (reclaim), and are exposed to userspace. I am not
> > sure we want to do that.
>
> snapshot_refaults() doesn't seem to follow after flush
> and
> workigset_refault()'s flush doesn't seem to preceed readers
>
> Is the flush misplaced or have I overlooked something?
> (If the former, it seems to work good enough even with the current
> flushing heuristics :-))

We flush in prepare_scan_count() before reading WORKINGSET_ACTIVATE_*
state. That flush also implicitly precedes every call to
snapshot_refaults(), which is unclear and not so robust, but we are
effectively flushing before snapshot_refaults() too.


>
>
> Michal