linux-kernel - Re: memcg causes crashes in list_lru

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190429104326.GG21837@dhcp22.suse.cz>
Date:   Mon, 29 Apr 2019 12:43:26 +0200
From:   Michal Hocko <mhocko@...nel.org>
To:     Jiri Slaby <jslaby@...e.cz>
Cc:     Johannes Weiner <hannes@...xchg.org>,
        Vladimir Davydov <vdavydov.dev@...il.com>,
        cgroups@...r.kernel.org, mm <linux-mm@...ck.org>,
        Linux kernel mailing list <linux-kernel@...r.kernel.org>,
        Raghavendra K T <raghavendra.kt@...ux.vnet.ibm.com>
Subject: Re: memcg causes crashes in list_lru_add

On Mon 29-04-19 12:40:51, Michal Hocko wrote:
> On Mon 29-04-19 12:09:53, Jiri Slaby wrote:
> > On 29. 04. 19, 11:25, Jiri Slaby wrote:> memcg_update_all_list_lrus
> > should take care about resizing the array.
> > 
> > It should, but:
> > [    0.058362] Number of physical nodes 2
> > [    0.058366] Skipping disabled node 0
> > 
> > So this should be the real fix:
> > --- linux-5.0-stable1.orig/mm/list_lru.c
> > +++ linux-5.0-stable1/mm/list_lru.c
> > @@ -37,11 +37,12 @@ static int lru_shrinker_id(struct list_l
> > 
> >  static inline bool list_lru_memcg_aware(struct list_lru *lru)
> >  {
> > -       /*
> > -        * This needs node 0 to be always present, even
> > -        * in the systems supporting sparse numa ids.
> > -        */
> > -       return !!lru->node[0].memcg_lrus;
> > +       int i;
> > +
> > +       for_each_online_node(i)
> > +               return !!lru->node[i].memcg_lrus;
> > +
> > +       return false;
> >  }
> > 
> >  static inline struct list_lru_one *
> > 
> > 
> > 
> > 
> > 
> > Opinions?
> 
> Please report upstream. This code here is there for quite some time.
> I do not really remember why we do have an assumption about node 0
> and why it hasn't been problem until now.

Humm, I blame jet-lag. I was convinced that this is an internal email.
Sorry about the confusion.

Anyway, time to revisit 145949a1387ba. CCed Raghavendra.
-- 
Michal Hocko
SUSE Labs