linux-kernel - Re: [PATCH v4 5/7] mm: rework non-root kmem

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190521192320.GA6658@tower.DHCP.thefacebook.com>
Date:   Tue, 21 May 2019 19:23:28 +0000
From:   Roman Gushchin <guro@...com>
To:     Waiman Long <longman@...hat.com>
CC:     Shakeel Butt <shakeelb@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Linux MM <linux-mm@...ck.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Kernel Team <Kernel-team@...com>,
        "Johannes Weiner" <hannes@...xchg.org>,
        Michal Hocko <mhocko@...nel.org>,
        Rik van Riel <riel@...riel.com>,
        Christoph Lameter <cl@...ux.com>,
        Vladimir Davydov <vdavydov.dev@...il.com>,
        Cgroups <cgroups@...r.kernel.org>
Subject: Re: [PATCH v4 5/7] mm: rework non-root kmem_cache lifecycle
 management

On Tue, May 21, 2019 at 02:39:50PM -0400, Waiman Long wrote:
> On 5/14/19 8:06 PM, Shakeel Butt wrote:
> >> @@ -2651,20 +2652,35 @@ struct kmem_cache *memcg_kmem_get_cache(struct kmem_cache *cachep)
> >>         struct mem_cgroup *memcg;
> >>         struct kmem_cache *memcg_cachep;
> >>         int kmemcg_id;
> >> +       struct memcg_cache_array *arr;
> >>
> >>         VM_BUG_ON(!is_root_cache(cachep));
> >>
> >>         if (memcg_kmem_bypass())
> >>                 return cachep;
> >>
> >> -       memcg = get_mem_cgroup_from_current();
> >> +       rcu_read_lock();
> >> +
> >> +       if (unlikely(current->active_memcg))
> >> +               memcg = current->active_memcg;
> >> +       else
> >> +               memcg = mem_cgroup_from_task(current);
> >> +
> >> +       if (!memcg || memcg == root_mem_cgroup)
> >> +               goto out_unlock;
> >> +
> >>         kmemcg_id = READ_ONCE(memcg->kmemcg_id);
> >>         if (kmemcg_id < 0)
> >> -               goto out;
> >> +               goto out_unlock;
> >>
> >> -       memcg_cachep = cache_from_memcg_idx(cachep, kmemcg_id);
> >> -       if (likely(memcg_cachep))
> >> -               return memcg_cachep;
> >> +       arr = rcu_dereference(cachep->memcg_params.memcg_caches);
> >> +
> >> +       /*
> >> +        * Make sure we will access the up-to-date value. The code updating
> >> +        * memcg_caches issues a write barrier to match this (see
> >> +        * memcg_create_kmem_cache()).
> >> +        */
> >> +       memcg_cachep = READ_ONCE(arr->entries[kmemcg_id]);
> >>
> >>         /*
> >>          * If we are in a safe context (can wait, and not in interrupt
> >> @@ -2677,10 +2693,20 @@ struct kmem_cache *memcg_kmem_get_cache(struct kmem_cache *cachep)
> >>          * memcg_create_kmem_cache, this means no further allocation
> >>          * could happen with the slab_mutex held. So it's better to
> >>          * defer everything.
> >> +        *
> >> +        * If the memcg is dying or memcg_cache is about to be released,
> >> +        * don't bother creating new kmem_caches. Because memcg_cachep
> >> +        * is ZEROed as the fist step of kmem offlining, we don't need
> >> +        * percpu_ref_tryget() here. css_tryget_online() check in
> > *percpu_ref_tryget_live()
> >
> >> +        * memcg_schedule_kmem_cache_create() will prevent us from
> >> +        * creation of a new kmem_cache.
> >>          */
> >> -       memcg_schedule_kmem_cache_create(memcg, cachep);
> >> -out:
> >> -       css_put(&memcg->css);
> >> +       if (unlikely(!memcg_cachep))
> >> +               memcg_schedule_kmem_cache_create(memcg, cachep);
> >> +       else if (percpu_ref_tryget(&memcg_cachep->memcg_params.refcnt))
> >> +               cachep = memcg_cachep;
> >> +out_unlock:
> >> +       rcu_read_lock();
> 
> There is one more bug that causes the kernel to panic on bootup when I
> turned on debugging options.
> 
> [   49.871437] =============================
> [   49.875452] WARNING: suspicious RCU usage
> [   49.879476] 5.2.0-rc1.bz1699202_memcg_test+ #2 Not tainted
> [   49.884967] -----------------------------
> [   49.888991] include/linux/rcupdate.h:268 Illegal context switch in
> RCU read-side critical section!
> [   49.897950]
> [   49.897950] other info that might help us debug this:
> [   49.897950]
> [   49.905958]
> [   49.905958] rcu_scheduler_active = 2, debug_locks = 1
> [   49.912492] 3 locks held by systemd/1:
> [   49.916252]  #0: 00000000633673c5 (&type->i_mutex_dir_key#5){.+.+},
> at: lookup_slow+0x42/0x70
> [   49.924788]  #1: 0000000029fa8c75 (rcu_read_lock){....}, at:
> memcg_kmem_get_cache+0x12b/0x910
> [   49.933316]  #2: 0000000029fa8c75 (rcu_read_lock){....}, at:
> memcg_kmem_get_cache+0x3da/0x910
> 
> It should be "rcu_read_unlock();" at the end.

Oops. Good catch, thanks Waiman!

I'm somewhat surprised it didn't get up in my tests, neither any of test
bots caught it. Anyway, I'll fix it and send v5.

Does the rest of the patchset looks sane to you?

Thank you!

Roman