linux-kernel - [BUG] lockdep splat with kernfs lockdep annotations and slab mutex from drm patch??

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190614093914.58f41d8f@gandalf.local.home>
Date:   Fri, 14 Jun 2019 09:39:14 -0400
From:   Steven Rostedt <rostedt@...dmis.org>
To:     Tejun Heo <tj@...nel.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Peter Zijlstra <peterz@...radead.org>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Chris Wilson <chris@...is-wilson.co.uk>,
        Tvrtko Ursulin <tvrtko.ursulin@...el.com>,
        David Airlie <airlied@...ux.ie>,
        Daniel Vetter <daniel@...ll.ch>,
        intel-gfx@...ts.freedesktop.org, dri-devel@...ts.freedesktop.org
Subject: [BUG] lockdep splat with kernfs lockdep annotations and slab mutex
 from drm patch??

I'm trying to debug some code where I need KASAN and lockdep enabled
but I can't get past this splat unrelated to my code. I bisected it
down to this commit:

 32eb6bcfdda9da ("drm/i915: Make request allocation caches global")

To make sure it wasn't a bad bisect, I removed the patch and the splat
goes away. I add the patch back, and the splat returns. I did this
several times, so I have a large confidence that this is the cause of
the splat, or at least it is the commit that triggers whatever is going
on.

Perhaps all the cache updates caused the slab_mutex to be called
inverse of the kernfs locking?

Attached is my config.

Also might be helpful, the splat happens:

  kernfs_fop_write()
    ops->write == sysfs_kf_write
       sysfs_kf_write()
         ops->store = slab_attr_store

-- Steve

         Starting Network Manager...
[   42.035383] 
[   42.040982] ======================================================
[   42.054681] WARNING: possible circular locking dependency detected
[   42.068242] 5.2.0-rc4-test+ #306 Not tainted
[   42.079827] ------------------------------------------------------
[   42.093323] slub_cpu_partia/927 is trying to acquire lock:
[   42.106136] 0000000050c8525d (slab_mutex){+.+.}, at: slab_attr_store+0x6d/0xe0
[   42.120667] 
[   42.120667] but task is already holding lock:
[   42.140962] 00000000fbcdac6b (kn->count#157){++++}, at: kernfs_fop_write+0x125/0x250
[   42.155978] 
[   42.155978] which lock already depends on the new lock.
[   42.155978] 
[   42.185778] 
[   42.185778] the existing dependency chain (in reverse order) is:
[   42.207505] 
[   42.207505] -> #1 (kn->count#157){++++}:
[   42.227197]        __kernfs_remove+0x413/0x4a0
[   42.238848]        kernfs_remove_by_name_ns+0x40/0x80
[   42.251141]        sysfs_slab_add+0x1b5/0x2f0
[   42.262669]        __kmem_cache_create+0x511/0x560
[   42.274623]        create_cache+0xcd/0x1f0
[   42.285844]        kmem_cache_create_usercopy+0x18a/0x240
[   42.298347]        kmem_cache_create+0x12/0x20
[   42.309854]        is_active_nid+0xdb/0x230 [snd_hda_codec_generic]
[   42.323129]        snd_hda_get_path_idx+0x55/0x80 [snd_hda_codec_generic]
[   42.336835]        get_nid_path+0xc/0x170 [snd_hda_codec_generic]
[   42.349750]        do_one_initcall+0xa2/0x394
[   42.360803]        do_init_module+0xfd/0x370
[   42.371613]        load_module+0x38c6/0x3bd0
[   42.382319]        __do_sys_finit_module+0x11a/0x1b0
[   42.393587]        do_syscall_64+0x68/0x250
[   42.403934]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   42.415554] 
[   42.415554] -> #0 (slab_mutex){+.+.}:
[   42.432367]        lock_acquire+0xbd/0x1d0
[   42.442133]        __mutex_lock+0xfc/0xb70
[   42.451749]        slab_attr_store+0x6d/0xe0
[   42.461489]        kernfs_fop_write+0x199/0x250
[   42.471364]        vfs_write+0xe1/0x240
[   42.480433]        ksys_write+0xba/0x150
[   42.489475]        do_syscall_64+0x68/0x250
[   42.498799]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   42.509558] 
[   42.509558] other info that might help us debug this:
[   42.509558] 
[   42.532347]  Possible unsafe locking scenario:
[   42.532347] 
[   42.547589]        CPU0                    CPU1
[   42.556652]        ----                    ----
[   42.565586]   lock(kn->count#157);
[   42.573255]                                lock(slab_mutex);
[   42.583158]                                lock(kn->count#157);
[   42.593287]   lock(slab_mutex);
[   42.600508] 
[   42.600508]  *** DEADLOCK ***
[   42.600508] 
[   42.617703] 3 locks held by slub_cpu_partia/927:
[   42.625835]  #0: 00000000fca4a536 (sb_writers#4){.+.+}, at: vfs_write+0x1ce/0x240
[   42.636775]  #1: 000000008c8467e8 (&of->mutex){+.+.}, at: kernfs_fop_write+0x115/0x250
[   42.648075]  #2: 00000000fbcdac6b (kn->count#157){++++}, at: kernfs_fop_write+0x125/0x250
[   42.659529] 
[   42.659529] stack backtrace:
[   42.670198] CPU: 2 PID: 927 Comm: slub_cpu_partia Not tainted 5.2.0-rc4-test+ #306
[   42.680898] Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016
[   42.693051] Call Trace:
[   42.698669]  dump_stack+0x7c/0xc0
[   42.705131]  print_circular_bug.cold.58+0x172/0x1d1
[   42.713151]  __lock_acquire+0x1e75/0x2170
[   42.720281]  ? lockdep_hardirqs_on+0x250/0x250
[   42.727804]  ? match_held_lock+0x1b/0x230
[   42.734842]  ? find_held_lock+0xac/0xd0
[   42.741709]  ? match_held_lock+0x1b/0x230
[   42.748707]  ? find_held_lock+0xac/0xd0
[   42.755477]  ? unfreeze_partials.isra.64+0x238/0x330
[   42.763376]  lock_acquire+0xbd/0x1d0
[   42.769883]  ? slab_attr_store+0x6d/0xe0
[   42.776771]  __mutex_lock+0xfc/0xb70
[   42.783289]  ? slab_attr_store+0x6d/0xe0
[   42.790134]  ? slab_attr_store+0x6d/0xe0
[   42.796948]  ? mutex_lock_io_nested+0xab0/0xab0
[   42.804392]  ? get_partial_node.isra.65.part.66+0x290/0x290
[   42.812864]  ? on_each_cpu_cond_mask+0xec/0x140
[   42.820309]  ? ksm_migrate_page+0x170/0x170
[   42.827398]  ? preempt_count_sub+0xb0/0x100
[   42.834487]  ? on_each_cpu_cond_mask+0xec/0x140
[   42.841924]  ? get_partial_node.isra.65.part.66+0x290/0x290
[   42.850430]  ? on_each_cpu_mask+0x90/0x90
[   42.857367]  ? slab_attr_store+0x6d/0xe0
[   42.864183]  slab_attr_store+0x6d/0xe0
[   42.870791]  ? sysfs_file_ops+0xa0/0xa0
[   42.877469]  kernfs_fop_write+0x199/0x250
[   42.884326]  vfs_write+0xe1/0x240
[   42.890484]  ksys_write+0xba/0x150
[   42.896710]  ? __ia32_sys_read+0x50/0x50
[   42.903459]  ? trace_hardirqs_off_caller+0x43/0x120
[   42.911157]  ? mark_held_locks+0x29/0xa0
[   42.917891]  ? do_syscall_64+0x14/0x250
[   42.924549]  do_syscall_64+0x68/0x250
[   42.931026]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   42.938916] RIP: 0033:0x7fd547c94ff8
[   42.945341] Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 25 77 0d 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89 d4 55
[   42.970255] RSP: 002b:00007ffefe76d608 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[   42.981019] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fd547c94ff8
[   42.991380] RDX: 0000000000000002 RSI: 0000560703698180 RDI: 0000000000000001
[   43.001750] RBP: 0000560703698180 R08: 000000000000000a R09: 00007fd547d67cc0
[   43.012134] R10: 000000000000000a R11: 0000000000000246 R12: 00007fd547d68780
[   43.022507] R13: 0000000000000002 R14: 00007fd547d63740 R15: 0000000000000002

Download attachment "config" of type "application/octet-stream" (131488 bytes)