lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Mon, 22 Aug 2022 19:23:53 +0200
From:   Vlastimil Babka <vbabka@...e.cz>
To:     Hyeonggon Yoo <42.hyeyoo@...il.com>
Cc:     Waiman Long <longman@...hat.com>, Christoph Lameter <cl@...ux.com>,
        Pekka Enberg <penberg@...nel.org>,
        David Rientjes <rientjes@...gle.com>,
        Joonsoo Kim <iamjoonsoo.kim@....com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Roman Gushchin <roman.gushchin@...ux.dev>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        Xin Long <lucien.xin@...il.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm/slab_common: Deleting kobject in kmem_cache_destroy()
 without holding slab_mutex/cpu_hotplug_lock

On 8/22/22 15:46, Hyeonggon Yoo wrote:
> On Mon, Aug 22, 2022 at 02:03:33PM +0200, Vlastimil Babka wrote:
>> On 8/10/22 16:08, Waiman Long wrote:
>>> On 8/10/22 05:34, Vlastimil Babka wrote:
>>>> On 8/9/22 22:59, Waiman Long wrote:
>>>>> A circular locking problem is reported by lockdep due to the following
>>>>> circular locking dependency.
>>>>>
>>>>>    +--> cpu_hotplug_lock --> slab_mutex --> kn->active#126 --+
>>>>>    |                                                         |
>>>>>    +---------------------------------------------------------+
>>>>
>>>> This sounded familiar and I've found a thread from January:
>>>>
>>>> https://lore.kernel.org/all/388098b2c03fbf0a732834fc01b2d875c335bc49.1642170196.git.lucien.xin@gmail.com/
>>>>
>>>> But that seemed to be specific to RHEL-8 RT kernel and not reproduced with
>>>> mainline. Is it different this time? Can you share the splats?
>>>
>>> I think this is easier to reproduce on a RT kernel, but it also happens in a
>>> non-RT kernel. One example splat that I got was
>>>
>>> [ 1777.114757] ======================================================
>>> [ 1777.121646] WARNING: possible circular locking dependency detected
>>> [ 1777.128544] 4.18.0-403.el8.x86_64+debug #1 Not tainted
>>> [ 1777.134280] ------------------------------------------------------
>>
>> Yeah that's non-RT, but still 4.18 kernel, as in Xin Long's thread
>> referenced above. That wasn't reproducible in current mainline and I would
>> expect yours also isn't, because it would be reported by others too.
> 
> I can confirm this splat is reproducible on 6.0-rc1 when conditions below are met:
> 	1) Lockdep is enabled
> 	2) kmem_cache_destroy() is executed at least once (e.g. loading slub_kunit module)
> 	3) flush_all() is executed at least once (e.g. writing to /sys/kernel/<slab>/cpu_partial)

Oh, great, that's useful, thanks!

...

> 
>> Also in both cases the lockdep (in 4.18) seems to have issue with
>> cpus_read_lock() which is a rwsem taken for read, so not really exclusive in
>> order to cause the reported deadlock.
> 
> Agreed.
> 
>> So I suspected lockdep was improved since 4.18 to not report a false
>> positive, but we never confirmed.
> 
> Seems not improved as it reports on 6.0-rc1.
> May fix lockdep instead of fixing SLUB?

So after discussing with PeterZ, the lockdep splat is legitimate,
because there could be a writer waiting on the first reader to finish,
and in that case rwsems block further readers so they don't starve the
writer, and thus the deadlock could happen.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ