lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 15 Dec 2017 12:33:03 +0300
From:   "Kirill A. Shutemov" <kirill@...temov.name>
To:     Anshuman Khandual <khandual@...ux.vnet.ibm.com>
Cc:     Yang Shi <yang.s@...baba-inc.com>, kirill.shutemov@...ux.intel.com,
        mhocko@...e.com, hughd@...gle.com, aarcange@...hat.com,
        akpm@...ux-foundation.org, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm: thp: use down_read_trylock in khugepaged to avoid
 long block

On Fri, Dec 15, 2017 at 10:04:27AM +0530, Anshuman Khandual wrote:
> On 12/15/2017 01:23 AM, Yang Shi wrote:
> > In the current design, khugepaged need acquire mmap_sem before scanning
> > mm, but in some corner case, khugepaged may scan the current running
> > process which might be modifying memory mapping, so khugepaged might
> > block in uninterruptible state. But, the process might hold the mmap_sem
> > for long time when modifying a huge memory space, then it may trigger
> > the below khugepaged hung issue:
> > 
> > INFO: task khugepaged:270 blocked for more than 120 seconds. 
> > Tainted: G E 4.9.65-006.ali3000.alios7.x86_64 #1
> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 
> > khugepaged D 0 270 2 0x00000000 
> > ffff883f3deae4c0 0000000000000000 ffff883f610596c0 ffff883f7d359440
> > ffff883f63818000 ffffc90019adfc78 ffffffff817079a5 d67e5aa8c1860a64
> > 0000000000000246 ffff883f7d359440 ffffc90019adfc88 ffff883f610596c0
> > Call Trace: 
> > [<ffffffff817079a5>] ? __schedule+0x235/0x6e0 
> > [<ffffffff81707e86>] schedule+0x36/0x80
> > [<ffffffff8170a970>] rwsem_down_read_failed+0xf0/0x150
> > [<ffffffff81384998>] call_rwsem_down_read_failed+0x18/0x30
> > [<ffffffff8170a1c0>] down_read+0x20/0x40
> > [<ffffffff81226836>] khugepaged+0x476/0x11d0
> > [<ffffffff810c9d0e>] ? idle_balance+0x1ce/0x300
> > [<ffffffff810d0850>] ? prepare_to_wait_event+0x100/0x100
> > [<ffffffff812263c0>] ? collapse_shmem+0xbf0/0xbf0
> > [<ffffffff810a8d46>] kthread+0xe6/0x100
> > [<ffffffff810a8c60>] ? kthread_park+0x60/0x60
> > [<ffffffff8170cd15>] ret_from_fork+0x25/0x30

What holds the lock for this long? I think the other side also worth fixing.

> > 
> > So, it sounds pointless to just block for waiting for the semaphore for
> > khugepaged, here replace down_read() to down_read_trylock() to move to
> > scan next mm quickly instead of just blocking on the semaphore so that
> > other processes can get more chances to install THP.
> > Then khugepaged can come back to scan the skipped mm when finish the
> > current round full_scan.
> 
> That may be too harsh on the process which now has to wait for a complete
> round of full scan before the khugepaged comes back. What if the mmap_sem
> contention because of VMA changes in the process was just temporary ?

It's always temporary. Unless something is very broken. :)

If the mmap_sem is taken for write, it may also mean that memory layout of the
process is not yet settled and we can just waste the time collapsing the pages
that about to go away.

And it's better for khugepaged to do the job than just waiting for the lock.

Acked-by: Kirill A. Shutemov <kirill.shutemov@...ux.intel.com>

-- 
 Kirill A. Shutemov

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ