lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170510070311.GA24772@bbox>
Date:   Wed, 10 May 2017 16:03:11 +0900
From:   Minchan Kim <minchan@...nel.org>
To:     Michal Hocko <mhocko@...nel.org>
Cc:     Johannes Weiner <hannes@...xchg.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Mel Gorman <mgorman@...hsingularity.net>, kernel-team@....com,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH] vmscan: scan pages until it founds eligible pages

On Wed, May 10, 2017 at 08:13:12AM +0200, Michal Hocko wrote:
> On Wed 10-05-17 10:46:54, Minchan Kim wrote:
> > On Wed, May 03, 2017 at 08:00:44AM +0200, Michal Hocko wrote:
> [...]
> > > @@ -1486,6 +1486,12 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
> > >  			continue;
> > >  		}
> > >  
> > > +		/*
> > > +		 * Do not count skipped pages because we do want to isolate
> > > +		 * some pages even when the LRU mostly contains ineligible
> > > +		 * pages
> > > +		 */
> > 
> > How about adding comment about "why"?
> > 
> > /*
> >  * Do not count skipped pages because it makes the function to return with
> >  * none isolated pages if the LRU mostly contains inelgible pages so that
> >  * VM cannot reclaim any pages and trigger premature OOM.
> >  */
> 
> I am not sure this is necessarily any better. Mentioning a pre-mature
> OOM would require a much better explanation because a first immediate
> question would be "why don't we scan those pages at priority 0". Also
> decision about the OOM is at a different layer and it might change in
> future when this doesn't apply any more. But it is not like I would
> insist...
> 
> > > +		scan++;
> > >  		switch (__isolate_lru_page(page, mode)) {
> > >  		case 0:
> > >  			nr_pages = hpage_nr_pages(page);
> > 
> > Confirmed.
> 
> Hmm. I can clearly see how we could skip over too many pages and hit
> small reclaim priorities too quickly but I am still scratching my head
> about how we could hit the OOM killer as a result. The amount of pages
> on the active anonymous list suggests that we are not able to rotate
> pages quickly enough. I have to keep thinking about that.

I explained it but seems to be not enouggh. Let me try again.

The problem is that get_scan_count determines nr_to_scan with
eligible zones.

        size = lruvec_lru_size(lruvec, lru, sc->reclaim_idx);
        size = size >> sc->priority;

Assumes sc->priority is 0 and LRU list is as follows.

        N-N-N-N-H-H-H-H-H-H-H-H-H-H-H-H-H-H-H-H

(Ie, small eligible pages are in the head of LRU but others are
almost ineligible pages)

In that case, size becomes 4 so VM want to scan 4 pages but 4 pages
from tail of the LRU are not eligible pages.
If get_scan_count counts skipped pages, it doesn't reclaim remained
pages after scanning 4 pages.

If it's more helpful to understand the problem, I will add it to
the description.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ