linux-kernel - Re: [linux-next: Tree for Jun 1] __khugepaged_exit rwsem_down_write

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160603010036.GA464@swordfish>
Date:	Fri, 3 Jun 2016 10:00:36 +0900
From:	Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
To:	Ebru Akagunduz <ebru.akagunduz@...il.com>
Cc:	Vlastimil Babka <vbabka@...e.cz>,
	sergey.senozhatsky.work@...il.com,
	Andrew Morton <akpm@...ux-foundation.org>,
	Michal Hocko <mhocko@...nel.org>,
	"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
	Stephen Rothwell <sfr@...b.auug.org.au>,
	Andrea Arcangeli <aarcange@...hat.com>,
	Rik van Riel <riel@...hat.com>, linux-mm@...ck.org,
	linux-next@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [linux-next: Tree for Jun 1] __khugepaged_exit
 rwsem_down_write_failed lockup

On (06/02/16 21:58), Ebru Akagunduz wrote:
[..]
> > I think it's this patch:
> > 
> > http://ozlabs.org/~akpm/mmots/broken-out/mm-thp-make-swapin-readahead-under-down_read-of-mmap_sem.patch
> > 
> > Some parts of the code in collapse_huge_page() that were under
> > down_write(mmap_sem) are under down_read() after the patch. But
> > there's "goto out" which continues via "goto out_up_write" which
> > does up_write(mmap_sem) so there's an imbalance. One path seems to
> > go via both up_read() and up_write(). I can imagine this can cause a
> > stuck down_write() among other things?
> Recently, I realized the same imbalance, it is an obvious
> inconsistency. I don't know, this issue can be related with
> mine. I'll send a fix patch.

a good find by Vlastimil.

Ebru, can you also re-visit __collapse_huge_page_swapin()? it's called
from collapse_huge_page() under the down_read(&mm->mmap_sem), is there
any reason to do the nested down_read(&mm->mmap_sem)?

collapse_huge_page()
...
	down_read(&mm->mmap_sem);
	result = hugepage_vma_revalidate(mm, vma, address);
	if (result)
		goto out;

	pmd = mm_find_pmd(mm, address);
	if (!pmd) {
		result = SCAN_PMD_NULL;
		goto out;
	}

	if (allocstall == curr_allocstall && swap != 0) {
		if (!__collapse_huge_page_swapin(mm, vma, address, pmd)) {
			{
			:	if (ret & VM_FAULT_RETRY) {
			:		down_read(&mm->mmap_sem);
			:		^^^^^^^^^
			:		if (hugepage_vma_revalidate(mm, vma, address))
			:			return false;
			:	}
			}

			up_read(&mm->mmap_sem);
			goto out;
		}
	}

	up_read(&mm->mmap_sem);



so if __collapse_huge_page_swapin() retruns true we have:
	- down_read() twice, up_read() once?

the locking rules here are a bit confusing. (I didn't have my morning coffee yet).

	-ss