lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 19 Nov 2018 12:34:09 -0800 (PST)
From:   Hugh Dickins <hughd@...gle.com>
To:     Michal Hocko <mhocko@...nel.org>
cc:     Baoquan He <bhe@...hat.com>, David Hildenbrand <david@...hat.com>,
        linux-mm@...ck.org, pifang@...hat.com,
        linux-kernel@...r.kernel.org, akpm@...ux-foundation.org,
        aarcange@...hat.com, Mel Gorman <mgorman@...e.de>,
        Vlastimil Babka <vbabka@...e.cz>,
        Hugh Dickins <hughd@...gle.com>
Subject: Re: Memory hotplug softlock issue

On Mon, 19 Nov 2018, Michal Hocko wrote:
> On Mon 19-11-18 15:10:16, Michal Hocko wrote:
> [...]
> > In other words. Why cannot we do the following?
> 
> Baoquan, this is certainly not the right fix but I would be really
> curious whether it makes the problem go away.
> 
> > diff --git a/mm/migrate.c b/mm/migrate.c
> > index f7e4bfdc13b7..7ccab29bcf9a 100644
> > --- a/mm/migrate.c
> > +++ b/mm/migrate.c
> > @@ -324,19 +324,9 @@ void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep,
> >  		goto out;
> >  
> >  	page = migration_entry_to_page(entry);
> > -
> > -	/*
> > -	 * Once page cache replacement of page migration started, page_count
> > -	 * *must* be zero. And, we don't want to call wait_on_page_locked()
> > -	 * against a page without get_page().
> > -	 * So, we use get_page_unless_zero(), here. Even failed, page fault
> > -	 * will occur again.
> > -	 */
> > -	if (!get_page_unless_zero(page))
> > -		goto out;
> >  	pte_unmap_unlock(ptep, ptl);
> > -	wait_on_page_locked(page);
> > -	put_page(page);
> > +	page_lock(page);
> > +	page_unlock(page);
> >  	return;
> >  out:
> >  	pte_unmap_unlock(ptep, ptl);

Thanks for Cc'ing me. I did mention precisely this issue two or three
times at LSF/MM this year, and claimed then that I would post the fix.

I'm glad that I delayed, what I had then (migration_waitqueue instead
of using page_waitqueue) was not wrong, but what I've been using the
last couple of months is rather better (and can be put to use to solve
similar problems in collapsing pages on huge tmpfs. but we don't need
to get into that at this time): put_and_wait_on_page_locked().

What I have not yet done is verify it on latest kernel, and research
the interested Cc list (Linus and Tim Chen come immediately to mind),
and write the commit comment. I have some testing to do on the latest
kernel today, so I'll throw put_and_wait_on_page_locked() in too,
and post tomorrow I hope.

Hugh

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ