[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LRH.2.02.1607140818250.15554@file01.intranet.prod.int.rdu2.redhat.com>
Date: Thu, 14 Jul 2016 08:27:12 -0400 (EDT)
From: Mikulas Patocka <mpatocka@...hat.com>
To: David Rientjes <rientjes@...gle.com>
cc: Michal Hocko <mhocko@...nel.org>,
Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>,
Ondrej Kozina <okozina@...hat.com>,
Jerome Marchand <jmarchan@...hat.com>,
Stanislav Kozina <skozina@...hat.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, dm-devel@...hat.com
Subject: Re: System freezes after OOM
On Wed, 13 Jul 2016, David Rientjes wrote:
> On Wed, 13 Jul 2016, Mikulas Patocka wrote:
>
> > What are the real problems that f9054c70d28bc214b2857cf8db8269f4f45a5e23
> > tries to fix?
> >
>
> It prevents the whole system from livelocking due to an oom killed process
> stalling forever waiting for mempool_alloc() to return. No other threads
> may be oom killed while waiting for it to exit.
>
> > Do you have a stacktrace where it deadlocked, or was just a theoretical
> > consideration?
> >
>
> schedule
> schedule_timeout
> io_schedule_timeout
> mempool_alloc
> __split_and_process_bio
> dm_request
> generic_make_request
> submit_bio
> mpage_readpages
> ext4_readpages
> __do_page_cache_readahead
> ra_submit
> filemap_fault
> handle_mm_fault
> __do_page_fault
> do_page_fault
> page_fault
Device mapper should be able to proceed if there is no available memory.
If it doesn't proceed, there is a bug in it.
I'd like to ask - what device mapper targets did you use in this case? Are
there some other deadlocked processes? (show sysrq-t, sysrq-w when this
happened)
Did the machine lock up completely with that stacktrace, or was it just
slowed down?
> > Mempool users generally (except for some flawed cases like fs_bio_set) do
> > not require memory to proceed. So if you just loop in mempool_alloc, the
> > processes that exhasted the mempool reserve will eventually return objects
> > to the mempool and you should proceed.
> >
>
> That's obviously not the case if we have hundreds of machines timing out
> after two hours waiting for that fault to succeed. The mempool interface
> cannot require that users return elements to the pool synchronous with all
> allocators so that we can happily loop forever, the only requirement on
Mempool users must return objects to the mempool.
> the interface is that mempool_alloc() must succeed. If the context of the
> thread doing mempool_alloc() allows access to memory reserves, this will
> always be allowed by the page allocator. This is not a mempool problem.
Mikulas
Powered by blists - more mailing lists