linux-kernel - Re: [PATCH v2 3/4] mm/slub: Fix another circular locking dependency in slab_attr

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CAG=TAF78e=71-cUt2jaTgY8QZDmucRO2JRo-rEWALe+dGVxoQw@mail.gmail.com>
Date:   Mon, 18 May 2020 22:37:46 -0400
From:   Qian Cai <cai@....pw>
To:     Waiman Long <longman@...hat.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Christoph Lameter <cl@...ux.com>,
        Pekka Enberg <penberg@...nel.org>,
        David Rientjes <rientjes@...gle.com>,
        Joonsoo Kim <iamjoonsoo.kim@....com>,
        Johannes Weiner <hannes@...xchg.org>,
        Michal Hocko <mhocko@...nel.org>,
        Vladimir Davydov <vdavydov.dev@...il.com>,
        Linux-MM <linux-mm@...ck.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Cgroups <cgroups@...r.kernel.org>,
        Juri Lelli <juri.lelli@...hat.com>
Subject: Re: [PATCH v2 3/4] mm/slub: Fix another circular locking dependency
 in slab_attr_store()

On Mon, May 18, 2020 at 6:05 PM Waiman Long <longman@...hat.com> wrote:
>
> On 5/16/20 10:19 PM, Qian Cai wrote:
> >
> >> On Apr 27, 2020, at 7:56 PM, Waiman Long <longman@...hat.com> wrote:
> >>
> >> It turns out that switching from slab_mutex to memcg_cache_ids_sem in
> >> slab_attr_store() does not completely eliminate circular locking dependency
> >> as shown by the following lockdep splat when the system is shut down:
> >>
> >> [ 2095.079697] Chain exists of:
> >> [ 2095.079697]   kn->count#278 --> memcg_cache_ids_sem --> slab_mutex
> >> [ 2095.079697]
> >> [ 2095.090278]  Possible unsafe locking scenario:
> >> [ 2095.090278]
> >> [ 2095.096227]        CPU0                    CPU1
> >> [ 2095.100779]        ----                    ----
> >> [ 2095.105331]   lock(slab_mutex);
> >> [ 2095.108486]                                lock(memcg_cache_ids_sem);
> >> [ 2095.114961]                                lock(slab_mutex);
> >> [ 2095.120649]   lock(kn->count#278);
> >> [ 2095.124068]
> >> [ 2095.124068]  *** DEADLOCK ***
> > Can you show the full splat?
> >
> >> To eliminate this possibility, we have to use trylock to acquire
> >> memcg_cache_ids_sem. Unlikely slab_mutex which can be acquired in
> >> many places, the memcg_cache_ids_sem write lock is only acquired
> >> in memcg_alloc_cache_id() to double the size of memcg_nr_cache_ids.
> >> So the chance of successive calls to memcg_alloc_cache_id() within
> >> a short time is pretty low. As a result, we can retry the read lock
> >> acquisition a few times if the first attempt fails.
> >>
> >> Signed-off-by: Waiman Long <longman@...hat.com>
> > The code looks a bit hacky and probably not that robust. Since it is the shutdown path which is not all that important without lockdep, maybe you could drop this single patch for now until there is a better solution?
>
> That is true. Unlike using the slab_mutex, the chance of failing to
> acquire a read lock on memcg_cache_ids_sem is pretty low. Maybe just
> print_once a warning if that happen.

That seems cleaner. If you are going to repost this series, you could
also mention that the series will fix slabinfo triggering a splat as
well.