lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.10.1512271715260.11689@chino.kir.corp.google.com>
Date:	Sun, 27 Dec 2015 17:27:03 -0800 (PST)
From:	David Rientjes <rientjes@...gle.com>
To:	Figo Zhang <tianfei.zhang@...el.com>
cc:	gregkh@...uxfoundation.org, mhocko@...e.com,
	linux-kernel@...r.kernel.org, arve@...roid.com,
	anton.vorontsov@...aro.org, kirill.shutemov@...ux.intel.com,
	riandrews@...roid.com, devel@...verdev.osuosl.org
Subject: Re: [PATCH RESEND v2 1/1] fix a dead loop when in heavy low memory

On Sun, 27 Dec 2015, Figo Zhang wrote:

> Android System UI hang when run heavy monkey stress test.
> 
> V2: add more detail about how to re-produce this issue, the
> important is install more than 100 apps/games.
> 
> Re-produce step:
> Run this monkey stress test script with more than 100
> apps/games installed:
> 
> adb shell "monkey --ignore-crashes --ignore-timeouts
> --kill-process-after-error --ignore-security-exceptions
> --throttle 200 -v 20000000"
> 
> kernel log:
> [ 1526.272125] lowmem_scan start: 128, 213da, ofree -9849 34419, ma 529
> [ 1526.272260] lowmemorykiller: select 'dTi-lm' (27289), adj 647, size 10630, to kill
> [ 1526.272299] lowmem_d_timeout=4296194081
> [ 1526.272303] Killing 'dTi-lm' (27289), adj 647,
> [ 1526.272303]    to free 42520kB on behalf of 'servicemanager' (2365) because
> [ 1526.272303]    cache 137676kB is below limit 221184kB for oom_score_adj 529
> [ 1526.272303]    Free memory is -39396kB above reserved
> [ 1526.272304] lowmem_scan end: 128, 213da, return 10630
> [ 1526.272710] lowmem_scan start: 128, 213da, ofree -9849 34373, ma 529
> [ 1526.272832] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296193081, 4296194081
> [ 1526.274450] lowmem_scan start: 128, 280da, ofree -9601 34327, ma 529
> [ 1526.274695] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296193083, 4296194081
> [ 1526.282292] lowmem_scan start: 128, 213da, ofree -9703 34327, ma 529
> [ 1526.282727] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296193090, 4296194081
> [ 1526.316888] lowmem_scan start: 128, 213da, ofree -9766 34465, ma 529
> [ 1526.317019] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296193125, 4296194081
> [ 1526.319311] lowmem_scan start: 128, 213da, ofree -9856 34419, ma 529
> [ 1526.319442] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296193125, 4296194081
> [ 1526.322026] lowmem_scan start: 128, 280da, ofree -9841 34327, ma 529
> [ 1526.360831] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296193166, 4296194081
> [ 1526.532233] lowmem_scan start: 128, 213da, ofree -9846 34511, ma 529
> [ 1526.644046] lowmem_scan start: 128, 213da, ofree -9785 34235, ma 529
> [ 1527.437578] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296194246, 4296195109
> [ 1527.442559] lowmem_scan start: 128, 213da, ofree -9850 41884, ma 529
> [ 1527.459540] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296194268, 4296195109
> [ 1527.500352] lowmem: TIF_MEMDIE, adj=647, dTi-lm, jiffies=4296194309, 4296195109
> 
> when this happened, the android system UI will hang, no process can be
> select to kill.
> 
> i found the the value of "lowmem_deathpending_timeout" will be modified
> strangely, like in last killing, the value is 4296194081, but why not it
> had changed to 4296195109? so it will cause the deadloop in low memory
> state which will cause the android system UI hang, because no process will
>  be kill.
> 

I'm assuming that you are loading the lowmem killer as a module since 
that's how you would modify lowmem_debug_level.  It appears that 
lowmem_debug_level is 2 from your kernel log, otherwise part of the log is 
missing.

I can tell this since you have a

	[ 1526.272260] lowmemorykiller: select 'dTi-lm' (27289), adj 647, size 10630, to kill

line but not a line matching "send sigkill to %d (%s), adj %hd, size %d\n" 
with loglevel 1.

I think changing lowmem_debug_level to 1 would help to understand this 
issue better.

I think lowmem_deathpending_timeout is getting changed to 4296195109 at

	[ 1526.532233] lowmem_scan start: 128, 213da, ofree -9846 34511, ma 529
	> HERE <
	[ 1526.644046] lowmem_scan start: 128, 213da, ofree -9785 34235, ma 529

However, it appears that the same process, dTi-lm, is still chosen for oom 
kill because lowmem_deathpending_timeout has expired.

So this looks like a problem if the constantly chosen process cannot exit.  
It would have been helpful to have the stack of pid 27289 in the log to 
see where it was stuck.  But I think it may be unrelated to 
lowmem_deathpending_timeout itself.  We'd be better off selecting a 
different process to kill with something like this:

diff --git a/drivers/staging/android/lowmemorykiller.c b/drivers/staging/android/lowmemorykiller.c
--- a/drivers/staging/android/lowmemorykiller.c
+++ b/drivers/staging/android/lowmemorykiller.c
@@ -128,11 +128,15 @@ static unsigned long lowmem_scan(struct shrinker *s, struct shrink_control *sc)
 		if (!p)
 			continue;
 
-		if (test_tsk_thread_flag(p, TIF_MEMDIE) &&
-		    time_before_eq(jiffies, lowmem_deathpending_timeout)) {
-			task_unlock(p);
-			rcu_read_unlock();
-			return 0;
+		if (test_tsk_thread_flag(p, TIF_MEMDIE)) {
+			if (time_before_eq(jiffies,
+					   lowmem_deathpending_timeout)) {
+				task_unlock(p);
+				rcu_read_unlock();
+				return 0;
+			}
+			/* Need to select a different process to kill */
+			continue;
 		}
 		oom_score_adj = p->signal->oom_score_adj;
 		if (oom_score_adj < min_score_adj) {

But we need more information.  Please make sure that lowmem_debug_level is 
1, try to get a complete kernel log, and if possible please try to capture 
the stack of the process that can't exit (use /proc/<pid>/stack) before 
trying the above patch.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ