linux-kernel - Re: System freezes after OOM

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LRH.2.02.1607140818250.15554@file01.intranet.prod.int.rdu2.redhat.com>
Date:	Thu, 14 Jul 2016 08:27:12 -0400 (EDT)
From:	Mikulas Patocka <mpatocka@...hat.com>
To:	David Rientjes <rientjes@...gle.com>
cc:	Michal Hocko <mhocko@...nel.org>,
	Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>,
	Ondrej Kozina <okozina@...hat.com>,
	Jerome Marchand <jmarchan@...hat.com>,
	Stanislav Kozina <skozina@...hat.com>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org, dm-devel@...hat.com
Subject: Re: System freezes after OOM



On Wed, 13 Jul 2016, David Rientjes wrote:

> On Wed, 13 Jul 2016, Mikulas Patocka wrote:
> 
> > What are the real problems that f9054c70d28bc214b2857cf8db8269f4f45a5e23 
> > tries to fix?
> > 
> 
> It prevents the whole system from livelocking due to an oom killed process 
> stalling forever waiting for mempool_alloc() to return.  No other threads 
> may be oom killed while waiting for it to exit.
> 
> > Do you have a stacktrace where it deadlocked, or was just a theoretical 
> > consideration?
> > 
> 
> schedule
> schedule_timeout
> io_schedule_timeout
> mempool_alloc
> __split_and_process_bio
> dm_request
> generic_make_request
> submit_bio
> mpage_readpages
> ext4_readpages
> __do_page_cache_readahead
> ra_submit
> filemap_fault
> handle_mm_fault
> __do_page_fault
> do_page_fault
> page_fault

Device mapper should be able to proceed if there is no available memory. 
If it doesn't proceed, there is a bug in it.

I'd like to ask - what device mapper targets did you use in this case? Are 
there some other deadlocked processes? (show sysrq-t, sysrq-w when this 
happened)

Did the machine lock up completely with that stacktrace, or was it just 
slowed down?

> > Mempool users generally (except for some flawed cases like fs_bio_set) do 
> > not require memory to proceed. So if you just loop in mempool_alloc, the 
> > processes that exhasted the mempool reserve will eventually return objects 
> > to the mempool and you should proceed.
> > 
> 
> That's obviously not the case if we have hundreds of machines timing out 
> after two hours waiting for that fault to succeed.  The mempool interface 
> cannot require that users return elements to the pool synchronous with all 
> allocators so that we can happily loop forever, the only requirement on 

Mempool users must return objects to the mempool.

> the interface is that mempool_alloc() must succeed.  If the context of the 
> thread doing mempool_alloc() allows access to memory reserves, this will 
> always be allowed by the page allocator.  This is not a mempool problem.

Mikulas