lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160525120957.GH20132@dhcp22.suse.cz>
Date:	Wed, 25 May 2016 14:09:58 +0200
From:	Michal Hocko <mhocko@...nel.org>
To:	Oleg Nesterov <oleg@...hat.com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Andrea Arcangeli <aarcange@...hat.com>,
	Mel Gorman <mgorman@...hsingularity.net>,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: zone_reclaimable() leads to livelock in __alloc_pages_slowpath()

On Wed 25-05-16 00:43:41, Oleg Nesterov wrote:
> On 05/24, Michal Hocko wrote:
> >
> > On Mon 23-05-16 17:14:19, Oleg Nesterov wrote:
> > > On 05/23, Michal Hocko wrote:
> > [...]
> > > > Could you add some tracing and see what are the numbers
> > > > above?
> > >
> > > with the patch below I can press Ctrl-C when it hangs, this breaks the
> > > endless loop and the output looks like
> > >
> > > 	vmscan: ZONE=ffffffff8189f180 0 scanned=0 pages=6
> > > 	vmscan: ZONE=ffffffff8189eb00 0 scanned=1 pages=0
> > > 	...
> > > 	vmscan: ZONE=ffffffff8189eb00 0 scanned=2 pages=1
> > > 	vmscan: ZONE=ffffffff8189f180 0 scanned=4 pages=6
> > > 	...
> > > 	vmscan: ZONE=ffffffff8189f180 0 scanned=4 pages=6
> > > 	vmscan: ZONE=ffffffff8189f180 0 scanned=4 pages=6
> > >
> > > the numbers are always small.
> >
> > Small but scanned is not 0 and constant which means it either gets reset
> > repeatedly (something gets freed) or we have stopped scanning. Which
> > pattern can you see? I assume that the swap space is full at the time
> > (could you add get_nr_swap_pages() to the output).
> 
> no, I tested this without SWAP,

I have tried that as well but this just hit the OOM almost immediately.

> > I am trying to reproduce but your test case always hits the oom killer:
> 
> Did you try to run it in a loop? Usually it takes a while before the system
> hangs.

OK, I wasn't patient enough... Now I can reproduce.

[...] 
> So. I spent almost the whole day trying to understand whats going on, and
> of course I failed.
> 
> But. It _seems to me_ that the kernel "leaks" some pages in LRU_INACTIVE_FILE
> list because inactive_file_is_low() returns the wrong value. And do not even
> ask me why I think so, unlikely I will be able to explain ;) to remind, I never
> tried to read vmscan.c before.
> 
> But. if I change lruvec_lru_size()
> 
> 	-       return zone_page_state(lruvec_zone(lruvec), NR_LRU_BASE + lru);
> 	+       return zone_page_state_snapshot(lruvec_zone(lruvec), NR_LRU_BASE + lru);
> 
> the problem goes away too.

This is a bit surprising but my testing shows that the result shouldn't
make much difference. I can see some discrepancies between lru_vec size
and zone_reclaimable_pages but they are too small to actually matter.
What I do instead is that there are freed pages on the way and that
always resets the counter. And you can see the number of scanned pages
was really large there. See the attached serial log. So this smells like
the issue I was talking about when mentioning the livelock earlier.

Maybe your change to lruvec_lru_size resp. threshold size just changes
the timing for other operations to hit the race with the free less
likely. Or maybe my debugging output is interfering with the load. Dunno
but it seems like the inherent problem of the oom detection based on the
NR_PAGES_SCANNED to me.

Here is my debugging patch:
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c1069efcc4d7..ac554b633f97 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -834,8 +834,10 @@ static void free_pcppages_bulk(struct zone *zone, int count,
 
 	spin_lock(&zone->lock);
 	nr_scanned = zone_page_state(zone, NR_PAGES_SCANNED);
-	if (nr_scanned)
+	if (nr_scanned) {
+		pr_info("XXX: %s zone:%s clearing %lu pages\n", __FUNCTION__, zone->name, nr_scanned);
 		__mod_zone_page_state(zone, NR_PAGES_SCANNED, -nr_scanned);
+	}
 
 	while (to_free) {
 		struct page *page;
@@ -888,8 +890,10 @@ static void free_one_page(struct zone *zone,
 	unsigned long nr_scanned;
 	spin_lock(&zone->lock);
 	nr_scanned = zone_page_state(zone, NR_PAGES_SCANNED);
-	if (nr_scanned)
+	if (nr_scanned) {
+		pr_info("XXX: %s zone:%s clearing %lu pages\n", __FUNCTION__, zone->name, nr_scanned);
 		__mod_zone_page_state(zone, NR_PAGES_SCANNED, -nr_scanned);
+	}
 
 	if (unlikely(has_isolate_pageblock(zone) ||
 		is_migrate_isolate(migratetype))) {
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 142cb61f4822..499a361fe5ad 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1606,6 +1606,8 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 	__mod_zone_page_state(zone, NR_ISOLATED_ANON + file, nr_taken);
 
 	if (global_reclaim(sc)) {
+		if (!nr_scanned)
+			pr_info("XXXX: %s nothing scanned\n", __FUNCTION__);
 		__mod_zone_page_state(zone, NR_PAGES_SCANNED, nr_scanned);
 		if (current_is_kswapd())
 			__count_zone_vm_events(PGSCAN_KSWAPD, zone, nr_scanned);
@@ -1794,8 +1796,11 @@ static void shrink_active_list(unsigned long nr_to_scan,
 
 	nr_taken = isolate_lru_pages(nr_to_scan, lruvec, &l_hold,
 				     &nr_scanned, sc, isolate_mode, lru);
-	if (global_reclaim(sc))
+	if (global_reclaim(sc)) {
+		if (!nr_scanned)
+			pr_info("XXXX: %s nothing scanned\n", __FUNCTION__);
 		__mod_zone_page_state(zone, NR_PAGES_SCANNED, nr_scanned);
+	}
 
 	reclaim_stat->recent_scanned[file] += nr_taken;
 
@@ -2615,8 +2620,15 @@ static bool shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
 			reclaimable = true;
 
 		if (global_reclaim(sc) &&
-		    !reclaimable && zone_reclaimable(zone))
+		    !reclaimable && zone_reclaimable(zone)) {
+			unsigned long file_lru_pages = zone_page_state(zone, NR_INACTIVE_FILE) + zone_page_state(zone, NR_ACTIVE_FILE);
+			pr_info("XXX: %s zone:%s scanned:%lu scanned_snapshot:%lu reclaimable:%lu file_active:%lu file_inactive:%lu isolated:%lu\n",
+					__FUNCTION__, zone->name,
+					zone_page_state(zone, NR_PAGES_SCANNED), zone_page_state_snapshot(zone, NR_PAGES_SCANNED),
+					zone_reclaimable_pages(zone),
+					zone_page_state(zone, NR_ACTIVE_FILE), zone_page_state(zone, NR_INACTIVE_FILE), zone_page_state(zone, NR_ISOLATED_FILE));
 			reclaimable = true;
+		}
 	}
 
 	/*
-- 
Michal Hocko
SUSE Labs

Download attachment "oom-test.qcow_serial.log.gz" of type "application/gzip" (407738 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ