linux-kernel - Re: Deadlock in do_page_fault() on ARM (old kernel)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <52DDB65D.9060300@signal11.us>
Date:	Mon, 20 Jan 2014 18:50:53 -0500
From:	Alan Ott <alan@...nal11.us>
To:	Russell King - ARM Linux <linux@....linux.org.uk>
CC:	"linux-arm-kernel@...ts.infradead.org" 
	<linux-arm-kernel@...ts.infradead.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	linux-omap@...r.kernel.org
Subject: Re: Deadlock in do_page_fault() on ARM (old kernel)

On 01/17/2014 08:20 PM, Russell King - ARM Linux wrote:
> On Fri, Jan 17, 2014 at 07:57:16PM -0500, Alan Ott wrote:
>> On 01/17/2014 08:46 AM, Russell King - ARM Linux wrote:
>>> My suspicion therefore is that some other thread must have died while
>>> holding the mmap_sem, so there's probably a kernel oops earlier...
>>> that's my best guess at the moment without seeing the full backtrace.
>> There's no oops that I'm able to see.
>>
>> Each of the tasks which lockdep reports as "holding" mmap_sem are
>> blocking for it. If some other task had taken it and then crashed, I
>> assume lockdep would list the crashed task as also holding the resource
>> in the printout.
> My point is this:
>
> - the five (or six) threads which are trying to take the mmap_sem in
>    read-mode in the fault handler are all blocked on it - they haven't
>    taken the lock, which will only happen because there's a pending writer.
> - of these in your original post, there are two which faulted from
>    __copy_to_user_std().  __copy_to_user_std() doesn't take the mmap_sem -
>    this is the non-uaccess-with-memcpy path.
> - the pending writers are the two threads in sys_mmap_pgoff(), both of
>    which are blocked waiting to gain the write lock.
> - there are no *other* threads holding the mmap_sem lock.

Yes, all true. I don't remember why I started looking at the memcpy() case.

> So... there's a question here how we got into this state - and frankly
> I don't know.  What I do see from your latest dump is that there's two
> unknown modules there - something called rcu2m and another called
> buttoms, and there are two threads inside ioctls there.  Both have
> faulted from the function at 0xc0d2a394 (which won't appear in the
> backtrace, but is most likely __copy_to_user_std.)

Yes, there are a handful of out-of-tree modules.

> So, in the absence of you saying anything about there being any preceding
> oopses, my conclusion now is that one of those modules is taking the
> mmap_sem itself, and is the culpret inducing this deadlock.

Yes, I came to that as well. I had checked for the presence of mmap_sem 
in the sources of the out-of-tree modules and didn't see it. However, 
upon closer inspection, my grep-fu failed me as there were some backward 
symlinks I didn't account for. TI's cmemk module _is_ taking out 
mmap_sem. I wish I had seen this days ago. That's my new investigation path.

> Note that your dump ([2]) in your reply was just the hung task detector
> printing out the stacktrace for a few tasks, not the full all-threads
> stack dump which I was expecting.

Yes, in a misguided attempt to keep the SNR high, I didn't include the 
full dump, but only what I thought was the interesting part. I did 
another capture and the full dump is at [1] .

> So I'm pulling out these conclusions from the very little information
> you're supplying.

I appreciate it. Thank you for taking the time to reply.

Alan.

[1] 
http://www.signal11.us/~alan/stack_dump_all_tasks_with_frame_pointers.txt

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/