linux-kernel - Re: recursive locking: epoll.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 20 Jul 2011 10:05:23 +0200
From:	Paul Bolle <pebolle@...cali.nl>
To:	Dave Jones <davej@...hat.com>
Cc:	Linux Kernel <linux-kernel@...r.kernel.org>,
	davidel@...ilserver.org
Subject: Re: recursive locking: epoll.

On Fri, 2011-07-15 at 17:04 -0400, Dave Jones wrote:
> We just had a Fedora user report this lockdep trace.

0) Does this have a bugzilla.redhat.com number?

>   =============================================
>   [ INFO: possible recursive locking detected ]
>   3.0-0.rc7.git0.1.fc16.i686 #1
>   ---------------------------------------------
>   systemd-logind/651 is trying to acquire lock:
>    (&ep->mtx){+.+.+.}, at: [<c05285f1>] ep_scan_ready_list+0x32/0x154
>  
>   but task is already holding lock:
>    (&ep->mtx){+.+.+.}, at: [<c0528a90>] sys_epoll_ctl+0x103/0x481
>  
>   other info that might help us debug this:
>    Possible unsafe locking scenario:
>  
>          CPU0
>          ----
>     lock(&ep->mtx);
>     lock(&ep->mtx);
>  
>    *** DEADLOCK ***
>  
>    May be due to missing lock nesting notation
>  
>   2 locks held by systemd-logind/651:
>    #0:  (epmutex){+.+.+.}, at: [<c0528a4b>] sys_epoll_ctl+0xbe/0x481
>    #1:  (&ep->mtx){+.+.+.}, at: [<c0528a90>] sys_epoll_ctl+0x103/0x481
>  
>   stack backtrace:
>   Pid: 651, comm: systemd-logind Not tainted 3.0-0.rc7.git0.1.fc16.i686 #1
>   Call Trace:
>    [<c08490fe>] ? printk+0x2d/0x2f
>    [<c046b2ef>] __lock_acquire+0x811/0xb63
>    [<c0407c77>] ? sched_clock+0x8/0xb
>    [<c045d190>] ? sched_clock_local+0x10/0x18b
>    [<c05285f1>] ? ep_scan_ready_list+0x32/0x154
>    [<c046ba5e>] lock_acquire+0xad/0xe4
>    [<c05285f1>] ? ep_scan_ready_list+0x32/0x154
>    [<c08506bd>] __mutex_lock_common+0x49/0x2ee
>    [<c05285f1>] ? ep_scan_ready_list+0x32/0x154
>    [<c04332e6>] ? __might_sleep+0x29/0xfb
>    [<c046a912>] ? mark_lock+0x26/0x1f2
>    [<c0850a7c>] mutex_lock_nested+0x43/0x49
>    [<c05285f1>] ? ep_scan_ready_list+0x32/0x154
>    [<c05285f1>] ep_scan_ready_list+0x32/0x154
>    [<c05281cb>] ? ep_remove+0x9b/0x9b
>    [<c0528727>] ep_poll_readyevents_proc+0x14/0x16
>    [<c05283d6>] ep_call_nested.constprop.2+0x6d/0x9a
>    [<c0528713>] ? ep_scan_ready_list+0x154/0x154
>    [<c05284d2>] ep_eventpoll_poll+0x45/0x55
>    [<c0528b8c>] sys_epoll_ctl+0x1ff/0x481
>    [<c05282fb>] ? ep_send_events_proc+0xd5/0xd5
>    [<c08521ac>] syscall_call+0x7/0xb

1) It seems I just ran into that deadlock too (on a Fedora Rawhide
system, running a vanilla 3.0-0.rc7). I tried to capture it with a small
digital camera, but using that camera for screenshots is apparently
beyond my skills. (I could try capturing this message over a serial
line, if needed.)

2) Luckily, I also hit a related warning rebooting into v2.6.39, which I
could just cut and paste from dmesg's output:

=============================================
[ INFO: possible recursive locking detected ]
2.6.39-0.local9.fc16.i686 #1
---------------------------------------------
systemd-logind/807 is trying to acquire lock:
 (&ep->mtx){+.+.+.}, at: [<c0524a05>] ep_scan_ready_list+0x32/0x154

but task is already holding lock:
 (&ep->mtx){+.+.+.}, at: [<c0524ea4>] sys_epoll_ctl+0x103/0x481

other info that might help us debug this:
2 locks held by systemd-logind/807:
 #0:  (epmutex){+.+.+.}, at: [<c0524e5f>] sys_epoll_ctl+0xbe/0x481
 #1:  (&ep->mtx){+.+.+.}, at: [<c0524ea4>] sys_epoll_ctl+0x103/0x481

stack backtrace:
Pid: 807, comm: systemd-logind Not tainted 2.6.39-0.local9.fc16.i686 #1
Call Trace:
 [<c080af85>] ? printk+0x2d/0x2f
 [<c04690b7>] __lock_acquire+0x78f/0xae1
 [<c040790c>] ? sched_clock+0x8/0xb
 [<c045b858>] ? sched_clock_local+0x10/0x18b
 [<c0524a05>] ? ep_scan_ready_list+0x32/0x154
 [<c046981e>] lock_acquire+0xbc/0xdc
 [<c0524a05>] ? ep_scan_ready_list+0x32/0x154
 [<c08127f3>] __mutex_lock_common+0x4a/0x2f0
 [<c0524a05>] ? ep_scan_ready_list+0x32/0x154
 [<c0432502>] ? __might_sleep+0x29/0xfb
 [<c0466a50>] ? trace_hardirqs_off+0xb/0xd
 [<c0812b4e>] mutex_lock_nested+0x39/0x3e
 [<c0524a05>] ? ep_scan_ready_list+0x32/0x154
 [<c0524a05>] ep_scan_ready_list+0x32/0x154
 [<c05245df>] ? ep_remove+0x9b/0x9b
 [<c0524b3b>] ep_poll_readyevents_proc+0x14/0x16
 [<c05247ea>] ep_call_nested.constprop.2+0x6d/0x9a
 [<c0524b27>] ? ep_scan_ready_list+0x154/0x154
 [<c05248e6>] ep_eventpoll_poll+0x45/0x55
 [<c0524fa0>] sys_epoll_ctl+0x1ff/0x481
 [<c052470f>] ? ep_send_events_proc+0xd5/0xd5
 [<c0819fdf>] sysenter_do_call+0x12/0x38

3) Apparently this is something that is triggered by a brand new version
of systemd (systemd-30-1.fc16.i686, compiled on July 13th, which I
installed just yesterday, July 19th), as I do not recall seeing this
before.

4) Feel free to prod me for more information. 


Paul Bolle

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/