[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1570633715.5937.10.camel@lca.pw>
Date: Wed, 09 Oct 2019 11:08:35 -0400
From: Qian Cai <cai@....pw>
To: Michal Hocko <mhocko@...nel.org>
Cc: Petr Mladek <pmladek@...e.com>,
Christian Borntraeger <borntraeger@...ibm.com>,
Heiko Carstens <heiko.carstens@...ibm.com>,
sergey.senozhatsky.work@...il.com, rostedt@...dmis.org,
peterz@...radead.org, linux-mm@...ck.org,
john.ogness@...utronix.de, akpm@...ux-foundation.org,
Vasily Gorbik <gor@...ux.ibm.com>,
Peter Oberparleiter <oberpar@...ux.ibm.com>, david@...hat.com,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] mm/page_isolation: fix a deadlock with printk()
On Wed, 2019-10-09 at 16:34 +0200, Michal Hocko wrote:
> On Wed 09-10-19 10:19:44, Qian Cai wrote:
> > On Wed, 2019-10-09 at 15:51 +0200, Michal Hocko wrote:
>
> [...]
> > > Can you paste the full lock chain graph to be sure we are on the same
> > > page?
> >
> > WARNING: possible circular locking dependency detected
> > 5.3.0-next-20190917 #8 Not tainted
> > ------------------------------------------------------
> > test.sh/8653 is trying to acquire lock:
> > ffffffff865a4460 (console_owner){-.-.}, at:
> > console_unlock+0x207/0x750
> >
> > but task is already holding lock:
> > ffff88883fff3c58 (&(&zone->lock)->rlock){-.-.}, at:
> > __offline_isolated_pages+0x179/0x3e0
> >
> > which lock already depends on the new lock.
> >
> >
> > the existing dependency chain (in reverse order) is:
> >
> > -> #3 (&(&zone->lock)->rlock){-.-.}:
> > __lock_acquire+0x5b3/0xb40
> > lock_acquire+0x126/0x280
> > _raw_spin_lock+0x2f/0x40
> > rmqueue_bulk.constprop.21+0xb6/0x1160
> > get_page_from_freelist+0x898/0x22c0
> > __alloc_pages_nodemask+0x2f3/0x1cd0
> > alloc_pages_current+0x9c/0x110
> > allocate_slab+0x4c6/0x19c0
> > new_slab+0x46/0x70
> > ___slab_alloc+0x58b/0x960
> > __slab_alloc+0x43/0x70
> > __kmalloc+0x3ad/0x4b0
> > __tty_buffer_request_room+0x100/0x250
> > tty_insert_flip_string_fixed_flag+0x67/0x110
> > pty_write+0xa2/0xf0
> > n_tty_write+0x36b/0x7b0
> > tty_write+0x284/0x4c0
> > __vfs_write+0x50/0xa0
> > vfs_write+0x105/0x290
> > redirected_tty_write+0x6a/0xc0
> > do_iter_write+0x248/0x2a0
> > vfs_writev+0x106/0x1e0
> > do_writev+0xd4/0x180
> > __x64_sys_writev+0x45/0x50
> > do_syscall_64+0xcc/0x76c
> > entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> This one looks indeed legit. pty_write is allocating memory from inside
> the port->lock. But this seems to be quite broken, right? The forward
> progress depends on GFP_ATOMIC allocation which might fail easily under
> memory pressure. So the preferred way to fix this should be to change
> the allocation scheme to use the preallocated buffer and size it from a
> context when it doesn't hold internal locks. It might be a more complex
> fix than using printk_deferred or other games but addressing that would
> make the pty code more robust as well.
I am not really sure if doing a surgery in pty code is better than fixing the
memory offline side as a short-term fix.
Powered by blists - more mailing lists