lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 21 May 2019 19:23:28 +0000
From:   Roman Gushchin <guro@...com>
To:     Waiman Long <longman@...hat.com>
CC:     Shakeel Butt <shakeelb@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Linux MM <linux-mm@...ck.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Kernel Team <Kernel-team@...com>,
        "Johannes Weiner" <hannes@...xchg.org>,
        Michal Hocko <mhocko@...nel.org>,
        Rik van Riel <riel@...riel.com>,
        Christoph Lameter <cl@...ux.com>,
        Vladimir Davydov <vdavydov.dev@...il.com>,
        Cgroups <cgroups@...r.kernel.org>
Subject: Re: [PATCH v4 5/7] mm: rework non-root kmem_cache lifecycle
 management

On Tue, May 21, 2019 at 02:39:50PM -0400, Waiman Long wrote:
> On 5/14/19 8:06 PM, Shakeel Butt wrote:
> >> @@ -2651,20 +2652,35 @@ struct kmem_cache *memcg_kmem_get_cache(struct kmem_cache *cachep)
> >>         struct mem_cgroup *memcg;
> >>         struct kmem_cache *memcg_cachep;
> >>         int kmemcg_id;
> >> +       struct memcg_cache_array *arr;
> >>
> >>         VM_BUG_ON(!is_root_cache(cachep));
> >>
> >>         if (memcg_kmem_bypass())
> >>                 return cachep;
> >>
> >> -       memcg = get_mem_cgroup_from_current();
> >> +       rcu_read_lock();
> >> +
> >> +       if (unlikely(current->active_memcg))
> >> +               memcg = current->active_memcg;
> >> +       else
> >> +               memcg = mem_cgroup_from_task(current);
> >> +
> >> +       if (!memcg || memcg == root_mem_cgroup)
> >> +               goto out_unlock;
> >> +
> >>         kmemcg_id = READ_ONCE(memcg->kmemcg_id);
> >>         if (kmemcg_id < 0)
> >> -               goto out;
> >> +               goto out_unlock;
> >>
> >> -       memcg_cachep = cache_from_memcg_idx(cachep, kmemcg_id);
> >> -       if (likely(memcg_cachep))
> >> -               return memcg_cachep;
> >> +       arr = rcu_dereference(cachep->memcg_params.memcg_caches);
> >> +
> >> +       /*
> >> +        * Make sure we will access the up-to-date value. The code updating
> >> +        * memcg_caches issues a write barrier to match this (see
> >> +        * memcg_create_kmem_cache()).
> >> +        */
> >> +       memcg_cachep = READ_ONCE(arr->entries[kmemcg_id]);
> >>
> >>         /*
> >>          * If we are in a safe context (can wait, and not in interrupt
> >> @@ -2677,10 +2693,20 @@ struct kmem_cache *memcg_kmem_get_cache(struct kmem_cache *cachep)
> >>          * memcg_create_kmem_cache, this means no further allocation
> >>          * could happen with the slab_mutex held. So it's better to
> >>          * defer everything.
> >> +        *
> >> +        * If the memcg is dying or memcg_cache is about to be released,
> >> +        * don't bother creating new kmem_caches. Because memcg_cachep
> >> +        * is ZEROed as the fist step of kmem offlining, we don't need
> >> +        * percpu_ref_tryget() here. css_tryget_online() check in
> > *percpu_ref_tryget_live()
> >
> >> +        * memcg_schedule_kmem_cache_create() will prevent us from
> >> +        * creation of a new kmem_cache.
> >>          */
> >> -       memcg_schedule_kmem_cache_create(memcg, cachep);
> >> -out:
> >> -       css_put(&memcg->css);
> >> +       if (unlikely(!memcg_cachep))
> >> +               memcg_schedule_kmem_cache_create(memcg, cachep);
> >> +       else if (percpu_ref_tryget(&memcg_cachep->memcg_params.refcnt))
> >> +               cachep = memcg_cachep;
> >> +out_unlock:
> >> +       rcu_read_lock();
> 
> There is one more bug that causes the kernel to panic on bootup when I
> turned on debugging options.
> 
> [   49.871437] =============================
> [   49.875452] WARNING: suspicious RCU usage
> [   49.879476] 5.2.0-rc1.bz1699202_memcg_test+ #2 Not tainted
> [   49.884967] -----------------------------
> [   49.888991] include/linux/rcupdate.h:268 Illegal context switch in
> RCU read-side critical section!
> [   49.897950]
> [   49.897950] other info that might help us debug this:
> [   49.897950]
> [   49.905958]
> [   49.905958] rcu_scheduler_active = 2, debug_locks = 1
> [   49.912492] 3 locks held by systemd/1:
> [   49.916252]  #0: 00000000633673c5 (&type->i_mutex_dir_key#5){.+.+},
> at: lookup_slow+0x42/0x70
> [   49.924788]  #1: 0000000029fa8c75 (rcu_read_lock){....}, at:
> memcg_kmem_get_cache+0x12b/0x910
> [   49.933316]  #2: 0000000029fa8c75 (rcu_read_lock){....}, at:
> memcg_kmem_get_cache+0x3da/0x910
> 
> It should be "rcu_read_unlock();" at the end.

Oops. Good catch, thanks Waiman!

I'm somewhat surprised it didn't get up in my tests, neither any of test
bots caught it. Anyway, I'll fix it and send v5.

Does the rest of the patchset looks sane to you?

Thank you!

Roman

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ