[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ece42e74-821e-662b-2c07-ea0756962bec@suse.cz>
Date: Mon, 22 Aug 2022 19:23:53 +0200
From: Vlastimil Babka <vbabka@...e.cz>
To: Hyeonggon Yoo <42.hyeyoo@...il.com>
Cc: Waiman Long <longman@...hat.com>, Christoph Lameter <cl@...ux.com>,
Pekka Enberg <penberg@...nel.org>,
David Rientjes <rientjes@...gle.com>,
Joonsoo Kim <iamjoonsoo.kim@....com>,
Andrew Morton <akpm@...ux-foundation.org>,
Roman Gushchin <roman.gushchin@...ux.dev>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
Xin Long <lucien.xin@...il.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm/slab_common: Deleting kobject in kmem_cache_destroy()
without holding slab_mutex/cpu_hotplug_lock
On 8/22/22 15:46, Hyeonggon Yoo wrote:
> On Mon, Aug 22, 2022 at 02:03:33PM +0200, Vlastimil Babka wrote:
>> On 8/10/22 16:08, Waiman Long wrote:
>>> On 8/10/22 05:34, Vlastimil Babka wrote:
>>>> On 8/9/22 22:59, Waiman Long wrote:
>>>>> A circular locking problem is reported by lockdep due to the following
>>>>> circular locking dependency.
>>>>>
>>>>> +--> cpu_hotplug_lock --> slab_mutex --> kn->active#126 --+
>>>>> | |
>>>>> +---------------------------------------------------------+
>>>>
>>>> This sounded familiar and I've found a thread from January:
>>>>
>>>> https://lore.kernel.org/all/388098b2c03fbf0a732834fc01b2d875c335bc49.1642170196.git.lucien.xin@gmail.com/
>>>>
>>>> But that seemed to be specific to RHEL-8 RT kernel and not reproduced with
>>>> mainline. Is it different this time? Can you share the splats?
>>>
>>> I think this is easier to reproduce on a RT kernel, but it also happens in a
>>> non-RT kernel. One example splat that I got was
>>>
>>> [ 1777.114757] ======================================================
>>> [ 1777.121646] WARNING: possible circular locking dependency detected
>>> [ 1777.128544] 4.18.0-403.el8.x86_64+debug #1 Not tainted
>>> [ 1777.134280] ------------------------------------------------------
>>
>> Yeah that's non-RT, but still 4.18 kernel, as in Xin Long's thread
>> referenced above. That wasn't reproducible in current mainline and I would
>> expect yours also isn't, because it would be reported by others too.
>
> I can confirm this splat is reproducible on 6.0-rc1 when conditions below are met:
> 1) Lockdep is enabled
> 2) kmem_cache_destroy() is executed at least once (e.g. loading slub_kunit module)
> 3) flush_all() is executed at least once (e.g. writing to /sys/kernel/<slab>/cpu_partial)
Oh, great, that's useful, thanks!
...
>
>> Also in both cases the lockdep (in 4.18) seems to have issue with
>> cpus_read_lock() which is a rwsem taken for read, so not really exclusive in
>> order to cause the reported deadlock.
>
> Agreed.
>
>> So I suspected lockdep was improved since 4.18 to not report a false
>> positive, but we never confirmed.
>
> Seems not improved as it reports on 6.0-rc1.
> May fix lockdep instead of fixing SLUB?
So after discussing with PeterZ, the lockdep splat is legitimate,
because there could be a writer waiting on the first reader to finish,
and in that case rwsems block further readers so they don't starve the
writer, and thus the deadlock could happen.
Powered by blists - more mailing lists