linux-kernel - Re: Linus GIT - INFO: possible circular locking dependency detected

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20111109191155.GB2490@hades>
Date:	Wed, 9 Nov 2011 19:11:55 +0000
From:	Luis Henriques <henrix@...andro.org>
To:	Greg KH <greg@...ah.com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Vasiliy Kulikov <segoon@...nwall.com>,
	Miles Lane <miles.lane@...il.com>,
	LKML <linux-kernel@...r.kernel.org>,
	Ingo Molnar <mingo@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Alexey Dobriyan <adobriyan@...il.com>, stable@...r.kernel.org
Subject: Re: Linus GIT - INFO: possible circular locking dependency detected

Hi,

On Tue, Nov 08, 2011 at 04:40:13PM -0800, Greg KH wrote:
> > It's just me.  And my script-bots, but they are all controlled by me in
> > the end.  Hopefully....
> > 
> > > that aa6afca5bca ("proc:
> > > fix races against execve() of /proc/PID/fd**") is known to cause a
> > > regression.
> > 
> > Ok, I'll go delete it from the stable queues for now.
> 
> Now removed.

I finally took another look at this, and although I'm far from being an
expert on these areas, I believe the trace information from lockdep may
actually be incorrect.  Here's what I'm getting:

[   12.948038] exe/36 is trying to acquire lock:
[   12.948038]  (&sig->cred_guard_mutex){+.+.+.}, at: [<ffffffff811b301e>] lock_trace+0x2e/0x80
[   12.948038] 
[   12.948038] but task is already holding lock:
[   12.948038]  (&sb->s_type->i_mutex_key#6){+.+.+.}, at: [<ffffffff8115f8b8>] vfs_readdir+0x78/0xd0
[   12.948038] 
[   12.948038] which lock already depends on the new lock.

So, sig->cred_guard_mutex is acquired (in lock_trace) after
sb->s_type->i_mutex_key (in vfs_readdir).  Now, take a look at the traces:

[   12.948038] -> #1 (&sb->s_type->i_mutex_key#6){+.+.+.}:
[   12.948038]        [<ffffffff81092e4f>] lock_acquire+0xaf/0x1f0
[   12.948038]        [<ffffffff8135b2a5>] __mutex_lock_common+0x65/0x4d0
[   12.948038]        [<ffffffff8135b72b>] mutex_lock_nested+0x1b/0x20
[   12.948038]        [<ffffffff81158c0a>] do_lookup+0x28a/0x3b0
[   12.948038]        [<ffffffff8115929f>] link_path_walk+0x12f/0x870
[   12.948038]        [<ffffffff8115b0ab>] path_openat+0xbb/0x380
[   12.948038]        [<ffffffff8115b3b2>] do_filp_open+0x42/0xa0
[   12.948038]        [<ffffffff81152cb2>] open_exec+0x32/0xf0
[   12.948038]        [<ffffffff81153dd7>] do_execve_common.clone.32+0x137/0x330
[   12.948038]        [<ffffffff81153feb>] do_execve+0x1b/0x20
[   12.948038]        [<ffffffff8100c78a>] sys_execve+0x4a/0x80
[   12.948038]        [<ffffffff8135ed1c>] stub_execve+0x6c/0xc0
[   12.948038] 
[   12.948038] -> #0 (&sig->cred_guard_mutex){+.+.+.}:
[   12.948038]        [<ffffffff8108ff9f>] __lock_acquire+0x17bf/0x2020
[   12.948038]        [<ffffffff81092e4f>] lock_acquire+0xaf/0x1f0
[   12.948038]        [<ffffffff8135b2a5>] __mutex_lock_common+0x65/0x4d0
[   12.948038]        [<ffffffff8135b76b>] mutex_lock_killable_nested+0x1b/0x20
[   12.948038]        [<ffffffff811b301e>] lock_trace+0x2e/0x80
[   12.948038]        [<ffffffff811b73ab>] proc_readfd_common+0x5b/0x4b0
[   12.948038]        [<ffffffff811b7835>] proc_readfd+0x15/0x20
[   12.948038]        [<ffffffff8115f8f0>] vfs_readdir+0xb0/0xd0
[   12.948038]        [<ffffffff8115fa09>] sys_getdents+0x89/0x100
[   12.948038]        [<ffffffff8135e8c2>] system_call_fastpath+0x16/0x1b

sb->s_type->i_mutex_key is shown as being acquired in the execve path,
which seems to be wrong -- it was acquired in the vfs_readdir (on the 2nd
trace).

This means that the initial analysis from Vasiliy is incorrect, as he
assumed the execve path.  Or Am I interpreting this log incorrectly?
(Probably I am...).

Anyway, if my analysis is correct, replacing the lock_trace by a simple
ptrace_may_access() should be enough.  Something like:

-       if (lock_trace(p))
+       if (!ptrace_may_access(p, PTRACE_MODE_ATTACH))
                goto out;

Obviously, the unlock_trace() should be removed as well... But I may be
missing other cases where the lock_trace is actually required.

BTW, I get this log simply by running:

# ls /proc/1/fd

Just my 2 cents...

Cheers,
--
Luis Henriques
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/