[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160529212540.GA15180@redhat.com>
Date: Sun, 29 May 2016 23:25:40 +0200
From: Oleg Nesterov <oleg@...hat.com>
To: Michal Hocko <mhocko@...nel.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Andrea Arcangeli <aarcange@...hat.com>,
Mel Gorman <mgorman@...hsingularity.net>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: zone_reclaimable() leads to livelock in __alloc_pages_slowpath()
sorry for delay,
On 05/25, Michal Hocko wrote:
>
> On Wed 25-05-16 00:43:41, Oleg Nesterov wrote:
> >
> > But. It _seems to me_ that the kernel "leaks" some pages in LRU_INACTIVE_FILE
> > list because inactive_file_is_low() returns the wrong value. And do not even
> > ask me why I think so, unlikely I will be able to explain ;) to remind, I never
> > tried to read vmscan.c before.
No, this is not because of inactive_file_is_low(), but
> >
> > But. if I change lruvec_lru_size()
> >
> > - return zone_page_state(lruvec_zone(lruvec), NR_LRU_BASE + lru);
> > + return zone_page_state_snapshot(lruvec_zone(lruvec), NR_LRU_BASE + lru);
> >
> > the problem goes away too.
Yes,
> This is a bit surprising but my testing shows that the result shouldn't
> make much difference. I can see some discrepancies between lru_vec size
> and zone_reclaimable_pages but they are too small to actually matter.
Yes, the difference is small but it does matter.
I do not pretend I understand this all, but finally it seems I understand
whats going on on my system when it hangs. At least, why the change in
lruvec_lru_size() or calculate_normal_threshold() makes a difference.
This single change in get_scan_count() under for_each_evictable_lru() loop
- size = lruvec_lru_size(lruvec, lru);
+ size = zone_page_state_snapshot(lruvec_zone(lruvec), NR_LRU_BASE + lru);
fixes the problem too.
Without this change shrink*() continues to scan the LRU_ACTIVE_FILE list
while it is empty. LRU_INACTIVE_FILE is not empty (just a few pages) but
we do not even try to scan it, lruvec_lru_size() returns zero.
Then later we recheck zone_reclaimable() and it notices the INACTIVE_FILE
counter because it uses the _snapshot variant, this leads to livelock.
I guess this doesn't really matter, but in my particular case these
ACTIVE/INACTIVE counters were screwed by the recent putback_inactive_pages()
logic. The pages we "leak" in INACTIVE list were recently moved from ACTIVE
to INACTIVE list, and this updated only the per-cpu ->vm_stat_diff[] counters,
so the "non snapshot" lruvec_lru_size() in get_scan_count() sees the "old"
numbers.
I even added more printk's, and yes when the system hangs I have something
like, say,
->vm_stat[ACTIVE] = NR; // small number
->vm_stat_diff[ACTIVE] = -NR; // so it is actually zero but
// get_scan_count() sees NR
->vm_stat[INACTIVE] = 0; // this is what get_scan_count() sees
->vm_stat_diff[INACTIVE] = NR; // and this is what zone_reclaimable()
Oleg.
Powered by blists - more mailing lists