linux-kernel - Re: zone_reclaimable() leads to livelock in __alloc_pages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160524224341.GA11961@redhat.com>
Date:	Wed, 25 May 2016 00:43:41 +0200
From:	Oleg Nesterov <oleg@...hat.com>
To:	Michal Hocko <mhocko@...nel.org>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Andrea Arcangeli <aarcange@...hat.com>,
	Mel Gorman <mgorman@...hsingularity.net>,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: zone_reclaimable() leads to livelock in __alloc_pages_slowpath()

On 05/24, Michal Hocko wrote:
>
> On Mon 23-05-16 17:14:19, Oleg Nesterov wrote:
> > On 05/23, Michal Hocko wrote:
> [...]
> > > Could you add some tracing and see what are the numbers
> > > above?
> >
> > with the patch below I can press Ctrl-C when it hangs, this breaks the
> > endless loop and the output looks like
> >
> > 	vmscan: ZONE=ffffffff8189f180 0 scanned=0 pages=6
> > 	vmscan: ZONE=ffffffff8189eb00 0 scanned=1 pages=0
> > 	...
> > 	vmscan: ZONE=ffffffff8189eb00 0 scanned=2 pages=1
> > 	vmscan: ZONE=ffffffff8189f180 0 scanned=4 pages=6
> > 	...
> > 	vmscan: ZONE=ffffffff8189f180 0 scanned=4 pages=6
> > 	vmscan: ZONE=ffffffff8189f180 0 scanned=4 pages=6
> >
> > the numbers are always small.
>
> Small but scanned is not 0 and constant which means it either gets reset
> repeatedly (something gets freed) or we have stopped scanning. Which
> pattern can you see? I assume that the swap space is full at the time
> (could you add get_nr_swap_pages() to the output).

no, I tested this without SWAP,

> Also zone->name would
> be better than the pointer.

Yes, forgot to mention, this is DMA32. To remind, only 512m of RAM so
this is natural.

> I am trying to reproduce but your test case always hits the oom killer:

Did you try to run it in a loop? Usually it takes a while before the system
hangs.

> Swap:       138236      57740      80496

perhaps this makes a difference? See above, I have no SWAP.

So. I spent almost the whole day trying to understand whats going on, and
of course I failed.

But. It _seems to me_ that the kernel "leaks" some pages in LRU_INACTIVE_FILE
list because inactive_file_is_low() returns the wrong value. And do not even
ask me why I think so, unlikely I will be able to explain ;) to remind, I never
tried to read vmscan.c before.

But. if I change lruvec_lru_size()

	-       return zone_page_state(lruvec_zone(lruvec), NR_LRU_BASE + lru);
	+       return zone_page_state_snapshot(lruvec_zone(lruvec), NR_LRU_BASE + lru);

the problem goes away too.

To remind, it also goes away if I change calculate_normal_threshold() to return
zero, and it was not clear why. Now we can probably conclude that that this is
because the change obviouslt affects lruvec_lru_size().

Oleg.