linux-kernel - Re: [PATCH] mm/slab_common: Deleting kobject in kmem_cache_destroy() without holding slab_mutex/cpu_hotplug

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <ece42e74-821e-662b-2c07-ea0756962bec@suse.cz>
Date:   Mon, 22 Aug 2022 19:23:53 +0200
From:   Vlastimil Babka <vbabka@...e.cz>
To:     Hyeonggon Yoo <42.hyeyoo@...il.com>
Cc:     Waiman Long <longman@...hat.com>, Christoph Lameter <cl@...ux.com>,
        Pekka Enberg <penberg@...nel.org>,
        David Rientjes <rientjes@...gle.com>,
        Joonsoo Kim <iamjoonsoo.kim@....com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Roman Gushchin <roman.gushchin@...ux.dev>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        Xin Long <lucien.xin@...il.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm/slab_common: Deleting kobject in kmem_cache_destroy()
 without holding slab_mutex/cpu_hotplug_lock

On 8/22/22 15:46, Hyeonggon Yoo wrote:
> On Mon, Aug 22, 2022 at 02:03:33PM +0200, Vlastimil Babka wrote:
>> On 8/10/22 16:08, Waiman Long wrote:
>>> On 8/10/22 05:34, Vlastimil Babka wrote:
>>>> On 8/9/22 22:59, Waiman Long wrote:
>>>>> A circular locking problem is reported by lockdep due to the following
>>>>> circular locking dependency.
>>>>>
>>>>>    +--> cpu_hotplug_lock --> slab_mutex --> kn->active#126 --+
>>>>>    |                                                         |
>>>>>    +---------------------------------------------------------+
>>>>
>>>> This sounded familiar and I've found a thread from January:
>>>>
>>>> https://lore.kernel.org/all/388098b2c03fbf0a732834fc01b2d875c335bc49.1642170196.git.lucien.xin@gmail.com/
>>>>
>>>> But that seemed to be specific to RHEL-8 RT kernel and not reproduced with
>>>> mainline. Is it different this time? Can you share the splats?
>>>
>>> I think this is easier to reproduce on a RT kernel, but it also happens in a
>>> non-RT kernel. One example splat that I got was
>>>
>>> [ 1777.114757] ======================================================
>>> [ 1777.121646] WARNING: possible circular locking dependency detected
>>> [ 1777.128544] 4.18.0-403.el8.x86_64+debug #1 Not tainted
>>> [ 1777.134280] ------------------------------------------------------
>>
>> Yeah that's non-RT, but still 4.18 kernel, as in Xin Long's thread
>> referenced above. That wasn't reproducible in current mainline and I would
>> expect yours also isn't, because it would be reported by others too.
> 
> I can confirm this splat is reproducible on 6.0-rc1 when conditions below are met:
> 	1) Lockdep is enabled
> 	2) kmem_cache_destroy() is executed at least once (e.g. loading slub_kunit module)
> 	3) flush_all() is executed at least once (e.g. writing to /sys/kernel/<slab>/cpu_partial)

Oh, great, that's useful, thanks!

...

> 
>> Also in both cases the lockdep (in 4.18) seems to have issue with
>> cpus_read_lock() which is a rwsem taken for read, so not really exclusive in
>> order to cause the reported deadlock.
> 
> Agreed.
> 
>> So I suspected lockdep was improved since 4.18 to not report a false
>> positive, but we never confirmed.
> 
> Seems not improved as it reports on 6.0-rc1.
> May fix lockdep instead of fixing SLUB?

So after discussing with PeterZ, the lockdep splat is legitimate,
because there could be a writer waiting on the first reader to finish,
and in that case rwsems block further readers so they don't starve the
writer, and thus the deadlock could happen.