[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <52DDB65D.9060300@signal11.us>
Date: Mon, 20 Jan 2014 18:50:53 -0500
From: Alan Ott <alan@...nal11.us>
To: Russell King - ARM Linux <linux@....linux.org.uk>
CC: "linux-arm-kernel@...ts.infradead.org"
<linux-arm-kernel@...ts.infradead.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
linux-omap@...r.kernel.org
Subject: Re: Deadlock in do_page_fault() on ARM (old kernel)
On 01/17/2014 08:20 PM, Russell King - ARM Linux wrote:
> On Fri, Jan 17, 2014 at 07:57:16PM -0500, Alan Ott wrote:
>> On 01/17/2014 08:46 AM, Russell King - ARM Linux wrote:
>>> My suspicion therefore is that some other thread must have died while
>>> holding the mmap_sem, so there's probably a kernel oops earlier...
>>> that's my best guess at the moment without seeing the full backtrace.
>> There's no oops that I'm able to see.
>>
>> Each of the tasks which lockdep reports as "holding" mmap_sem are
>> blocking for it. If some other task had taken it and then crashed, I
>> assume lockdep would list the crashed task as also holding the resource
>> in the printout.
> My point is this:
>
> - the five (or six) threads which are trying to take the mmap_sem in
> read-mode in the fault handler are all blocked on it - they haven't
> taken the lock, which will only happen because there's a pending writer.
> - of these in your original post, there are two which faulted from
> __copy_to_user_std(). __copy_to_user_std() doesn't take the mmap_sem -
> this is the non-uaccess-with-memcpy path.
> - the pending writers are the two threads in sys_mmap_pgoff(), both of
> which are blocked waiting to gain the write lock.
> - there are no *other* threads holding the mmap_sem lock.
Yes, all true. I don't remember why I started looking at the memcpy() case.
> So... there's a question here how we got into this state - and frankly
> I don't know. What I do see from your latest dump is that there's two
> unknown modules there - something called rcu2m and another called
> buttoms, and there are two threads inside ioctls there. Both have
> faulted from the function at 0xc0d2a394 (which won't appear in the
> backtrace, but is most likely __copy_to_user_std.)
Yes, there are a handful of out-of-tree modules.
> So, in the absence of you saying anything about there being any preceding
> oopses, my conclusion now is that one of those modules is taking the
> mmap_sem itself, and is the culpret inducing this deadlock.
Yes, I came to that as well. I had checked for the presence of mmap_sem
in the sources of the out-of-tree modules and didn't see it. However,
upon closer inspection, my grep-fu failed me as there were some backward
symlinks I didn't account for. TI's cmemk module _is_ taking out
mmap_sem. I wish I had seen this days ago. That's my new investigation path.
> Note that your dump ([2]) in your reply was just the hung task detector
> printing out the stacktrace for a few tasks, not the full all-threads
> stack dump which I was expecting.
Yes, in a misguided attempt to keep the SNR high, I didn't include the
full dump, but only what I thought was the interesting part. I did
another capture and the full dump is at [1] .
> So I'm pulling out these conclusions from the very little information
> you're supplying.
I appreciate it. Thank you for taking the time to reply.
Alan.
[1]
http://www.signal11.us/~alan/stack_dump_all_tasks_with_frame_pointers.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists