linux-kernel - Re: [External] Re: [PATCH v3 2/4] mm: memcontrol: make page_memcg{

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALvZod5n6Yqgz4x=sHTz+Sx7oS2B0id3KsWDVkb8xwbVbrZUWg@mail.gmail.com>
Date:   Fri, 12 Mar 2021 15:18:38 -0800
From:   Shakeel Butt <shakeelb@...gle.com>
To:     Johannes Weiner <hannes@...xchg.org>
Cc:     Muchun Song <songmuchun@...edance.com>,
        Roman Gushchin <guro@...com>, Michal Hocko <mhocko@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Vladimir Davydov <vdavydov.dev@...il.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Linux Memory Management List <linux-mm@...ck.org>,
        Xiongchun duan <duanxiongchun@...edance.com>
Subject: Re: [External] Re: [PATCH v3 2/4] mm: memcontrol: make
 page_memcg{_rcu} only applicable for non-kmem page

On Fri, Mar 12, 2021 at 3:07 PM Johannes Weiner <hannes@...xchg.org> wrote:
>
> On Fri, Mar 12, 2021 at 02:42:45PM -0800, Shakeel Butt wrote:
> > Hi Johannes,
> >
> > On Fri, Mar 12, 2021 at 11:23 AM Johannes Weiner <hannes@...xchg.org> wrote:
> > >
> > [...]
> > >
> > > Longer term we most likely need it there anyway. The issue you are
> > > describing in the cover letter - allocations pinning memcgs for a long
> > > time - it exists at a larger scale and is causing recurring problems
> > > in the real world: page cache doesn't get reclaimed for a long time,
> > > or is used by the second, third, fourth, ... instance of the same job
> > > that was restarted into a new cgroup every time. Unreclaimable dying
> > > cgroups pile up, waste memory, and make page reclaim very inefficient.
> > >
> >
> > For the scenario described above, do we really want to reparent the
> > page cache pages? Shouldn't we recharge the pages to the second,
> > third, fourth and so on, memcgs? My concern is that we will see a big
> > chunk of page cache pages charged to root and will only get reclaimed
> > on global pressure.
>
> Sorry, I'm proposing to reparent to the ancestor, not root. It's an
> optimization, not a change in user-visible behavior.
>
> As far as the user can tell, the pages already belong to the parent
> after deletion: they'll show up in the parent's stats, naturally, and
> they will get reclaimed as part of the parent being reclaimed.
>
> The dead cgroup doesn't even have its own limit anymore after
> .css_reset() has run. And we already physically reparent slab objects
> in memcg_reparent_objcgs() and memcg_drain_all_list_lrus().
>
> I'm just saying we should do the same thing for LRU pages.

I understand the proposal and I agree it makes total sense when a job
is recycling sub-job/sub-container.

I was talking about the (recycling of the) top level cgroups. Though
for that to be an issue, I suppose the file system has to be shared
between the jobs on the system. I was wondering if a page cache
reaches the root memcg on multiple reparenting, should the next access
cause that page to be charged to the accessor?