linux-kernel - Re: [PATCH v2] mm/page_isolation: fix a deadlock with printk()

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1570450304.5576.283.camel@lca.pw>
Date:   Mon, 07 Oct 2019 08:11:44 -0400
From:   Qian Cai <cai@....pw>
To:     Michal Hocko <mhocko@...nel.org>
Cc:     akpm@...ux-foundation.org, sergey.senozhatsky.work@...il.com,
        pmladek@...e.com, rostedt@...dmis.org, peterz@...radead.org,
        david@...hat.com, john.ogness@...utronix.de, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] mm/page_isolation: fix a deadlock with printk()

On Mon, 2019-10-07 at 13:37 +0200, Michal Hocko wrote:
> On Mon 07-10-19 07:04:00, Qian Cai wrote:
> > 
> > 
> > > On Oct 7, 2019, at 4:07 AM, Michal Hocko <mhocko@...nel.org> wrote:
> > > 
> > > I do not think that removing the printk is the right long term solution.
> > > While I do agree that removing the debugging printk __offline_isolated_pages
> > > does make sense because it is essentially of a very limited use, this
> > > doesn't really solve the underlying problem.  There are likely other
> > > printks from zone->lock. It would be much more saner to actually
> > > disallow consoles to allocate any memory while printk is called from an
> > > atomic context.
> > 
> > No, there is only a handful of places called printk() from
> > zone->lock. It is normal that the callers will quietly process
> > “struct zone” modification in a short section with zone->lock
> > held.
> 
> It is extremely error prone to have any zone->lock vs. printk
> dependency. I do not want to play an endless whack a mole.
> 
> > No, it is not about “allocate any memory while printk is called from an
> > atomic context”. It is opposite lock chain  from different processors which has the same effect. For example,
> > 
> > CPU0:                 CPU1:         CPU2:
> > console_owner
> >                             sclp_lock
> > sclp_lock                                 zone_lock
> >                             zone_lock
> >                                                  console_owner
> 
> Why would sclp_lock ever take a zone->lock (apart from an allocation).
> So really if sclp_lock is a lock that might be taken from many contexts
> and generate very subtle lock dependencies then it should better be
> really careful what it is calling into.
> 
> In other words you are trying to fix a wrong end of the problem. Fix the
> console to not allocate or depend on MM by other means.

It looks there are way too many places that could generate those indirect lock
chains that are hard to eliminate them all. Here is anther example, where it
has,

console_owner -> port_lock
port_lock -> zone_lock

[  297.425922] -> #3 (&(&zone->lock)->rlock){-.-.}:
[  297.425925]        __lock_acquire+0x5b3/0xb40
[  297.425925]        lock_acquire+0x126/0x280
[  297.425926]        _raw_spin_lock+0x2f/0x40
[  297.425927]        rmqueue_bulk.constprop.21+0xb6/0x1160
[  297.425928]        get_page_from_freelist+0x898/0x22c0
[  297.425928]        __alloc_pages_nodemask+0x2f3/0x1cd0
[  297.425929]        alloc_pages_current+0x9c/0x110
[  297.425930]        allocate_slab+0x4c6/0x19c0
[  297.425931]        new_slab+0x46/0x70
[  297.425931]        ___slab_alloc+0x58b/0x960
[  297.425932]        __slab_alloc+0x43/0x70
[  297.425933]        __kmalloc+0x3ad/0x4b0
[  297.425933]        __tty_buffer_request_room+0x100/0x250
[  297.425934]        tty_insert_flip_string_fixed_flag+0x67/0x110
[  297.425935]        pty_write+0xa2/0xf0
[  297.425936]        n_tty_write+0x36b/0x7b0
[  297.425936]        tty_write+0x284/0x4c0
[  297.425937]        __vfs_write+0x50/0xa0
[  297.425938]        vfs_write+0x105/0x290
[  297.425939]        redirected_tty_write+0x6a/0xc0
[  297.425939]        do_iter_write+0x248/0x2a0
[  297.425940]        vfs_writev+0x106/0x1e0
[  297.425941]        do_writev+0xd4/0x180
[  297.425941]        __x64_sys_writev+0x45/0x50
[  297.425942]        do_syscall_64+0xcc/0x76c
[  297.425943]        entry_SYSCALL_64_after_hwframe+0x49/0xbe