lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 12 Jan 2022 21:22:36 +0800
From:   Muchun Song <songmuchun@...edance.com>
To:     Michal Koutný <mkoutny@...e.com>
Cc:     Matthew Wilcox <willy@...radead.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Johannes Weiner <hannes@...xchg.org>,
        Michal Hocko <mhocko@...nel.org>,
        Vladimir Davydov <vdavydov.dev@...il.com>,
        Shakeel Butt <shakeelb@...gle.com>,
        Roman Gushchin <guro@...com>, Yang Shi <shy828301@...il.com>,
        Alex Shi <alexs@...nel.org>,
        Wei Yang <richard.weiyang@...il.com>,
        Dave Chinner <david@...morbit.com>,
        trond.myklebust@...merspace.com, anna.schumaker@...app.com,
        jaegeuk@...nel.org, chao@...nel.org,
        Kari Argillander <kari.argillander@...il.com>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Linux Memory Management List <linux-mm@...ck.org>,
        linux-nfs@...r.kernel.org, Qi Zheng <zhengqi.arch@...edance.com>,
        Xiongchun duan <duanxiongchun@...edance.com>,
        Fam Zheng <fam.zheng@...edance.com>,
        Muchun Song <smuchun@...il.com>
Subject: Re: [PATCH v5 10/16] mm: list_lru: allocate list_lru_one only when needed

On Thu, Jan 6, 2022 at 7:00 PM Michal Koutný <mkoutny@...e.com> wrote:
>
> On Mon, Dec 20, 2021 at 04:56:43PM +0800, Muchun Song <songmuchun@...edance.com> wrote:
> (Thanks for pointing me here.)
>
> > -void memcg_drain_all_list_lrus(int src_idx, struct mem_cgroup *dst_memcg)
> > +void memcg_drain_all_list_lrus(struct mem_cgroup *src, struct mem_cgroup *dst)
> >  {
> > +     struct cgroup_subsys_state *css;
> >       struct list_lru *lru;
> > +     int src_idx = src->kmemcg_id;
> > +
> > +     /*
> > +      * Change kmemcg_id of this cgroup and all its descendants to the
> > +      * parent's id, and then move all entries from this cgroup's list_lrus
> > +      * to ones of the parent.
> > +      *
> > +      * After we have finished, all list_lrus corresponding to this cgroup
> > +      * are guaranteed to remain empty. So we can safely free this cgroup's
> > +      * list lrus in memcg_list_lru_free().
> > +      *
> > +      * Changing ->kmemcg_id to the parent can prevent memcg_list_lru_alloc()
> > +      * from allocating list lrus for this cgroup after memcg_list_lru_free()
> > +      * call.
> > +      */
> > +     rcu_read_lock();
> > +     css_for_each_descendant_pre(css, &src->css) {
> > +             struct mem_cgroup *memcg;
> > +
> > +             memcg = mem_cgroup_from_css(css);
> > +             memcg->kmemcg_id = dst->kmemcg_id;
> > +     }
> > +     rcu_read_unlock();
>
> Do you envision using this function anywhere else beside offlining?
> If not, you shouldn't need traversing whole subtree because normally
> parents are offlined only after children (see cgroup_subsys_state.online_cnt).
>
> >
> >       mutex_lock(&list_lrus_mutex);
> >       list_for_each_entry(lru, &memcg_list_lrus, list)
> > -             memcg_drain_list_lru(lru, src_idx, dst_memcg);
> > +             memcg_drain_list_lru(lru, src_idx, dst);
> >       mutex_unlock(&list_lrus_mutex);
>
> If you do, then here you only drain list_lru of the subtree root but not
> the descendants anymore.
>
> So I do get that mem_cgroup.kmemcg_id refernces the "effective"
> kmemcg_id after offlining, so that proper list_lrus are used afterwards.
>
> I wonder -- is this necessary when objcgs are reparented too? IOW would
> anyone query the offlined child's kmemcg_id?
> (Maybe it's worth explaining better in the commit message, I think even
> current approach is OK (better safe than sorry).)
>

Sorry for the late reply.

Image a bunch of memcg as follows. C's parent is B, B's parent is A and
A's parent is root. The numbers in parentheses are their kmemcg_id
respectively.

  root(-1) -> A(0) -> B(1) -> C(2)

CPU0:                                   CPU1:
memcg_list_lru_alloc(C)
                                        memcg_drain_all_list_lrus(C)
                                        memcg_drain_all_list_lrus(B)
                                        // Now C and B are offline. The
                                        // kmemcg_id becomes the following if
                                        // we do not the kmemcg_id of its
                                        // descendants in
                                        // memcg_drain_all_list_lrus().
                                        //
                                        // root(-1) -> A(0) -> B(0) -> C(1)

  for (i = 0; memcg; memcg = parent_mem_cgroup(memcg), i++) {
      // allocate struct list_lru_per_memcg for memcg C
      table[i].mlru = memcg_init_list_lru_one(gfp);
  }

  spin_lock_irqsave(&lru->lock, flags);
  while (i--) {
      // here index = 1
      int index = table[i].memcg->kmemcg_id;

      struct list_lru_per_memcg *mlru = table[i].mlru;
      if (index < 0 || rcu_dereference_protected(mlrus->mlru[index], true))
          kfree(mlru);
      else
          // mlrus->mlru[index] will be assigned a new value regardless
          // memcg C is already offline.
          rcu_assign_pointer(mlrus->mlru[index], mlru);
  }
  spin_unlock_irqrestore(&lru->lock, flags);

So changing ->kmemcg_id of all its descendants can prevent
memcg_list_lru_alloc() from allocating list lrus for the offlined
cgroup after memcg_list_lru_free() calling.

Thanks.

Powered by blists - more mailing lists