linux-kernel - [PATCH 3.2 47/67] mm, vmstat: fix wrong WQ sleep when memory reclaim doesn't make any progress

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <lsq.1456263723.727239134@decadent.org.uk>
Date:	Tue, 23 Feb 2016 21:42:03 +0000
From:	Ben Hutchings <ben@...adent.org.uk>
To:	linux-kernel@...r.kernel.org, stable@...r.kernel.org
CC:	akpm@...ux-foundation.org, "Joonsoo Kim" <iamjoonsoo.kim@....com>,
	"Tetsuo Handa" <penguin-kernel@...ove.SAKURA.ne.jp>,
	"Linus Torvalds" <torvalds@...ux-foundation.org>,
	"Cristopher Lameter" <clameter@....com>,
	"Michal Hocko" <mhocko@...e.com>,
	"Tetsuo Handa" <penguin-kernel@...ove.sakura.ne.jp>,
	"Jan Stancek" <jstancek@...hat.com>,
	"Arkadiusz Miskiewicz" <arekm@...en.pl>,
	"Tejun Heo" <tj@...nel.org>
Subject: [PATCH 3.2 47/67] mm, vmstat: fix wrong WQ sleep when memory
 reclaim doesn't make any progress

3.2.78-rc1 review patch.  If anyone has any objections, please let me know.

------------------

From: Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>

commit 564e81a57f9788b1475127012e0fd44e9049e342 upstream.

Jan Stancek has reported that system occasionally hanging after "oom01"
testcase from LTP triggers OOM.  Guessing from a result that there is a
kworker thread doing memory allocation and the values between "Node 0
Normal free:" and "Node 0 Normal:" differs when hanging, vmstat is not
up-to-date for some reason.

According to commit 373ccbe59270 ("mm, vmstat: allow WQ concurrency to
discover memory reclaim doesn't make any progress"), it meant to force
the kworker thread to take a short sleep, but it by error used
schedule_timeout(1).  We missed that schedule_timeout() in state
TASK_RUNNING doesn't do anything.

Fix it by using schedule_timeout_uninterruptible(1) which forces the
kworker thread to take a short sleep in order to make sure that vmstat
is up-to-date.

Fixes: 373ccbe59270 ("mm, vmstat: allow WQ concurrency to discover memory reclaim doesn't make any progress")
Signed-off-by: Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
Reported-by: Jan Stancek <jstancek@...hat.com>
Acked-by: Michal Hocko <mhocko@...e.com>
Cc: Tejun Heo <tj@...nel.org>
Cc: Cristopher Lameter <clameter@....com>
Cc: Joonsoo Kim <iamjoonsoo.kim@....com>
Cc: Arkadiusz Miskiewicz <arekm@...en.pl>
Signed-off-by: Andrew Morton <akpm@...ux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@...ux-foundation.org>
Signed-off-by: Ben Hutchings <ben@...adent.org.uk>
---
 mm/backing-dev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -879,7 +879,7 @@ long wait_iff_congested(struct zone *zon
 		 * here rather than calling cond_resched().
 		 */
 		if (current->flags & PF_WQ_WORKER)
-			schedule_timeout(1);
+			schedule_timeout_uninterruptible(1);
 		else
 			cond_resched();