linux-kernel - Re: [PATCH] mm, vmstat: Allow WQ concurrency to discover memory reclaim doesn't make any progress

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20151125024435.GB9563@js1304-P5Q-DELUXE>
Date:	Wed, 25 Nov 2015 11:44:36 +0900
From:	Joonsoo Kim <iamjoonsoo.kim@....com>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	Michal Hocko <mhocko@...nel.org>,
	Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
	Tejun Heo <tj@...nel.org>,
	Cristopher Lameter <clameter@....com>,
	Arkadiusz Miśkiewicz <arekm@...en.pl>,
	linux-mm@...ck.org, LKML <linux-kernel@...r.kernel.org>,
	Michal Hocko <mhocko@...e.com>,
	Christoph Lameter <cl@...ux.com>
Subject: Re: [PATCH] mm, vmstat: Allow WQ concurrency to discover memory
 reclaim doesn't make any progress

On Tue, Nov 24, 2015 at 03:44:48PM -0800, Andrew Morton wrote:
> On Thu, 19 Nov 2015 13:30:53 +0100 Michal Hocko <mhocko@...nel.org> wrote:
> 
> > From: Michal Hocko <mhocko@...e.com>
> > 
> > Tetsuo Handa has reported that the system might basically livelock in OOM
> > condition without triggering the OOM killer. The issue is caused by
> > internal dependency of the direct reclaim on vmstat counter updates (via
> > zone_reclaimable) which are performed from the workqueue context.
> > If all the current workers get assigned to an allocation request,
> > though, they will be looping inside the allocator trying to reclaim
> > memory but zone_reclaimable can see stalled numbers so it will consider
> > a zone reclaimable even though it has been scanned way too much. WQ
> > concurrency logic will not consider this situation as a congested workqueue
> > because it relies that worker would have to sleep in such a situation.
> > This also means that it doesn't try to spawn new workers or invoke
> > the rescuer thread if the one is assigned to the queue.
> > 
> > In order to fix this issue we need to do two things. First we have to
> > let wq concurrency code know that we are in trouble so we have to do
> > a short sleep. In order to prevent from issues handled by 0e093d99763e
> > ("writeback: do not sleep on the congestion queue if there are no
> > congested BDIs or if significant congestion is not being encountered in
> > the current zone") we limit the sleep only to worker threads which are
> > the ones of the interest anyway.
> > 
> > The second thing to do is to create a dedicated workqueue for vmstat and
> > mark it WQ_MEM_RECLAIM to note it participates in the reclaim and to
> > have a spare worker thread for it.
> 
> This vmstat update thing is being a problem.  Please see Joonsoo's
> "mm/vmstat: retrieve more accurate vmstat value".
> 
> Joonsoo, might this patch help with that issue?

That issue cannot be solved by this patch. This patch solves blocking
vmstat updator problem but that issue is caused by long update delay
(not blocking). In there, update happens every 1 sec as usuall.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/