linux-kernel - Re: oops in copy_page

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20130109114413.GA13475@suse.de>
Date:	Wed, 9 Jan 2013 11:44:13 +0000
From:	Mel Gorman <mgorman@...e.de>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	"Kirill A. Shutemov" <kirill@...temov.name>,
	Hillf Danton <dhillf@...il.com>,
	Hugh Dickins <hughd@...gle.com>, Dave Jones <davej@...hat.com>,
	Linux Kernel <linux-kernel@...r.kernel.org>,
	Andrea Arcangeli <aarcange@...hat.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Linux-MM <linux-mm@...ck.org>, Rik van Riel <riel@...hat.com>
Subject: Re: oops in copy_page_rep()

On Tue, Jan 08, 2013 at 08:52:14AM -0800, Linus Torvalds wrote:
> On Tue, Jan 8, 2013 at 8:31 AM, Kirill A. Shutemov <kirill@...temov.name> wrote:
> >>
> >> Heh. I was more thinking about why do_huge_pmd_wp_page() needs it, but
> >> do_huge_pmd_numa_page() does not.
> >
> > It does. The check should be moved up.
> >
> >> Also, do we actually need it for huge_pmd_set_accessed()? The
> >> *placement* of that thing confuses me. And because it confuses me, I'd
> >> like to understand it.
> >
> > We need it for huge_pmd_set_accessed() too.
> >
> > Looks like a mis-merge. The original patch for huge_pmd_set_accessed() was
> > correct: http://lkml.org/lkml/2012/10/25/402
> 
> Not a merge error: the pmd_trans_splitting() check was removed by
> commit d10e63f29488 ("mm: numa: Create basic numa page hinting
> infrastructure").
> 
> Now, *why* it was removed, I can't tell. And it's not clear why the
> original code just had it in a conditional, while the suggested patch
> has that "goto repeat" thing.

It was a mistake by me to remove it and as I screwed up in October I no
longer remember how I managed it.

The retry versus "goto repeat" is a detail. By retrying the full fault
there is a possibility the split will still be in progress on fault
retry or that a new THP is collapsed underneath and a new split started
while the mmap_sem is released but both are unlikely. On the other side,
taking the anon_vma rwsem for write in wait_split_huge_page() could cause
delays elsewhere that would be almost impossible to detect so it is not
necessarily better. Retrying the fault as your patch does is reasonable.

> I suspect re-trying the fault (which I
> assume the original code did) is actually better, because that way you
> go through all the "should I reschedule as I return through the
> exception" stuff. I dunno.
> 
> Mel, that original patch came from you , although it was based on
> previous work by Peter/Ingo/Andrea. Can you walk us through the
> history and thinking about the loss of pmd_trans_splitting(). Was it
> purely a mistake? It looks intentional.
> 

Mistake. Andrea, Peter and Ingo did not make similar mistakes.

Looking at your patch, I also think that the check needs to be made before
the call to do_huge_pmd_numa_page() so it can reply on a pmd_same() check
to make sure a split did not start before the page table lock was taken.

In response you said to Andrea

	Also, and more fundamentally, since do_pmd_numa_page() doesn't
	take the orig_pmd thing as an argument (and re-check it under the
	page-table lock), testing pmd_trans_splitting() on it is pointless,
	since it can change later.

do_pmd_numa_page() is called for a normal PMD that is marked pmd_numa(), not
a THP PMD. As the mmap_sem is held it cannot collapse to a THP underneath us
after the pmd_trans_huge() check so it should be unnecessary to check
pmd_trans_splitting() there.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/