lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 6 Jun 2017 10:01:12 -0500
From:   Larry Finger <Larry.Finger@...inger.net>
To:     Vlastimil Babka <vbabka@...e.cz>,
        Andrew Morton <akpm@...ux-foundation.org>
Cc:     LKML <linux-kernel@...r.kernel.org>, linux-mm@...ck.org
Subject: Re: Sleeping BUG in khugepaged for i586

On 06/06/2017 09:02 AM, Vlastimil Babka wrote:
> On 06/05/2017 11:44 PM, Andrew Morton wrote:
>> On Sat, 3 Jun 2017 14:24:26 -0500 Larry Finger <Larry.Finger@...inger.net> wrote:
>>
>>> I recently turned on locking diagnostics for a Dell Latitude D600 laptop, which
>>> requires a 32-bit kernel. In the log I found the following:
>>>
>>> BUG: sleeping function called from invalid context at mm/khugepaged.c:655
>>> in_atomic(): 1, irqs_disabled(): 0, pid: 20, name: khugepaged
>>> 1 lock held by khugepaged/20:
>>>    #0:  (&mm->mmap_sem){++++++}, at: [<c03d6609>]
>>> collapse_huge_page.isra.47+0x439/0x1240
>>> CPU: 0 PID: 20 Comm: khugepaged Tainted: G        W
> 
> W means thre was WARN earler. Could be related... Got logs?

When I grabbed a splat, I got the last one in my log. The first one shows "Not 
tainted".

> 
>>> 4.12.0-rc1-wl-12125-g952a068 #80
> 
> What is "wl-12125-g952a068"? What patches on top of mainline?

I found this while chasing a problem with one of the wireless drivers. For that 
reason I use Kalle Valo's wireless-testing-next, which happens to be the only 
kernel tree I have on this laptop. I'm reasonably certain that the extra updates 
are not the cause of the problem as the first one appears before any of the 
wireless drivers are loaded, but I will pull a clean copy of mainline to test 
that assumption.

>>> Hardware name: Dell Computer Corporation Latitude D600
>>> /03U652, BIOS A05 05/29/2003
>>> Call Trace:
>>>    dump_stack+0x76/0xb2
>>>    ___might_sleep+0x174/0x230
>>>    collapse_huge_page.isra.47+0xacf/0x1240
>>>    khugepaged_scan_mm_slot+0x41e/0xc00
>>>    ? _raw_spin_lock+0x46/0x50
>>>    khugepaged+0x277/0x4f0
>>>    ? prepare_to_wait_event+0xe0/0xe0
>>>    kthread+0xeb/0x120
>>>    ? khugepaged_scan_mm_slot+0xc00/0xc00
>>>    ? kthread_create_on_node+0x30/0x30
>>>    ret_from_fork+0x21/0x30
>>>
>>> I have no idea when this problem was introduced. Of course, I will test any
>>> proposed fixes.
>>>
>>
>> Odd.  There's nothing wrong with cond_resched() while holding mmap_sem.
>> It looks like khugepaged forgot to do a spin_unlock somewhere and we
>> leaked a preempt_count.
> 
> Hmm I'd expect such spin lock to be reported together with mmap_sem in
> the debugging "locks held" message?

My bisection of the problem is about half done. My latest good version is commit 
7b8cd33 and the latest bad one is 2ea659a. Only about 7 steps to go.

Larry


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ