lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 13 Oct 2015 15:32:25 +0200
From:	Michal Hocko <mhocko@...nel.org>
To:	Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
Cc:	rientjes@...gle.com, oleg@...hat.com,
	torvalds@...ux-foundation.org, kwalker@...hat.com, cl@...ux.com,
	akpm@...ux-foundation.org, hannes@...xchg.org,
	vdavydov@...allels.com, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org, skozina@...hat.com
Subject: Re: Silent hang up caused by pages being not scanned?

On Tue 13-10-15 00:25:53, Tetsuo Handa wrote:
[...]
> What is strange, the values printed by this debug printk() patch did not
> change as time went by. Thus, I think that this is not a problem of lack of
> CPU time for scanning pages. I suspect that there is a bug that nobody is
> scanning pages.
> 
> ----------
> [   66.821450] zone_reclaimable returned 1 at line 2646
> [   66.823020] (ACTIVE_FILE=26+INACTIVE_FILE=10) * 6 > PAGES_SCANNED=32
> [   66.824935] shrink_zones returned 1 at line 2706
> [   66.826392] zones_reclaimable=1 at line 2765
> [   66.827865] do_try_to_free_pages returned 1 at line 2938
> [   67.102322] __perform_reclaim returned 1 at line 2854
> [   67.103968] did_some_progress=1 at line 3301
> (...snipped...)
> [  281.439977] zone_reclaimable returned 1 at line 2646
> [  281.439977] (ACTIVE_FILE=26+INACTIVE_FILE=10) * 6 > PAGES_SCANNED=32
> [  281.439978] shrink_zones returned 1 at line 2706
> [  281.439978] zones_reclaimable=1 at line 2765
> [  281.439979] do_try_to_free_pages returned 1 at line 2938
> [  281.439979] __perform_reclaim returned 1 at line 2854
> [  281.439980] did_some_progress=1 at line 3301

This is really interesting because even with reclaimable LRUs this low
we should eventually scan them enough times to convince zone_reclaimable
to fail. PAGES_SCANNED in your logs seems to be constant, though, which
suggests somebody manages to free a page every time before we get down
to priority 0 and manage to scan something finally. This is pretty much
pathological behavior and I have hard time to imagine how would that be
possible but it clearly shows that zone_reclaimable heuristic is not
working properly.

I can see two options here. Either we teach zone_reclaimable to be less
fragile or remove zone_reclaimable from shrink_zones altogether. Both of
them are risky because we have a long history of changes in this areas
which made other subtle behavior changes but I guess that the first
option should be less fragile. What about the following patch? I am not
happy about it because the condition is rather rough and a deeper
inspection is really needed to check all the call sites but it should be
good for testing.
--- 
>From afe1c5ef4726b78f51e850ed93564b52f3c73905 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@...e.com>
Date: Tue, 13 Oct 2015 15:12:13 +0200
Subject: [PATCH] mm, vmscan: Make zone_reclaimable less fragile

zone_reclaimable considers a zone unreclaimable if we have scanned all
the reclaimable pages sufficient times since the last page has been
freed and that still hasn't led to an allocation success. This can,
however, lead to a livelock/trashing when a single freed page resets
PAGES_SCANNED while memory consumers are looping over small LRUs without
making any progress (e.g. remaining pages on the LRU are dirty and all
the flushers are blocked) and failing to invoke the OOM killer beause
zone_reclaimable would consider the zone reclaimable.

Tetsuo Handa has reported the following:
: [   66.821450] zone_reclaimable returned 1 at line 2646
: [   66.823020] (ACTIVE_FILE=26+INACTIVE_FILE=10) * 6 > PAGES_SCANNED=32
: [   66.824935] shrink_zones returned 1 at line 2706
: [   66.826392] zones_reclaimable=1 at line 2765
: [   66.827865] do_try_to_free_pages returned 1 at line 2938
: [   67.102322] __perform_reclaim returned 1 at line 2854
: [   67.103968] did_some_progress=1 at line 3301
: (...snipped...)
: [  281.439977] zone_reclaimable returned 1 at line 2646
: [  281.439977] (ACTIVE_FILE=26+INACTIVE_FILE=10) * 6 > PAGES_SCANNED=32
: [  281.439978] shrink_zones returned 1 at line 2706
: [  281.439978] zones_reclaimable=1 at line 2765
: [  281.439979] do_try_to_free_pages returned 1 at line 2938
: [  281.439979] __perform_reclaim returned 1 at line 2854
: [  281.439980] did_some_progress=1 at line 3301

In his case anon LRUs are not reclaimable because there is no swap enabled.

It is not clear who frees a page that regularly but it is clear that no
progress can be made but zone_reclaimable still consider the zone
reclaimable.

This patch makes zone_reclaimable less fragile by checking the number of
reclaimable pages against low watermark. It doesn't make much sense to
rely on a PAGES_SCANNED heuristic if there are not enough reclaimable
pages to get us over min watermark.

Reported-by: Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
Signed-off-by: Michal Hocko <mhocko@...e.com>
---
 mm/vmscan.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index c88d74ad9304..f16266e0af70 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -209,8 +209,14 @@ static unsigned long zone_reclaimable_pages(struct zone *zone)
 
 bool zone_reclaimable(struct zone *zone)
 {
+	unsigned long reclaimable = zone_reclaimable_pages(zone);
+	unsigned long free = zone_page_state(zone, NR_FREE_PAGES);
+
+	if (reclaimable + free < min_wmark_pages(zone))
+		return false;
+
 	return zone_page_state(zone, NR_PAGES_SCANNED) <
-		zone_reclaimable_pages(zone) * 6;
+		reclaimable * 6;
 }
 
 static unsigned long get_lru_size(struct lruvec *lruvec, enum lru_list lru)
-- 
2.5.1

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ