lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1570453622.5576.288.camel@lca.pw>
Date:   Mon, 07 Oct 2019 09:07:02 -0400
From:   Qian Cai <cai@....pw>
To:     Michal Hocko <mhocko@...nel.org>
Cc:     akpm@...ux-foundation.org, sergey.senozhatsky.work@...il.com,
        pmladek@...e.com, rostedt@...dmis.org, peterz@...radead.org,
        david@...hat.com, john.ogness@...utronix.de, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] mm/page_isolation: fix a deadlock with printk()

On Mon, 2019-10-07 at 14:43 +0200, Michal Hocko wrote:
> On Mon 07-10-19 08:11:44, Qian Cai wrote:
> > On Mon, 2019-10-07 at 13:37 +0200, Michal Hocko wrote:
> > > On Mon 07-10-19 07:04:00, Qian Cai wrote:
> > > > 
> > > > 
> > > > > On Oct 7, 2019, at 4:07 AM, Michal Hocko <mhocko@...nel.org> wrote:
> > > > > 
> > > > > I do not think that removing the printk is the right long term solution.
> > > > > While I do agree that removing the debugging printk __offline_isolated_pages
> > > > > does make sense because it is essentially of a very limited use, this
> > > > > doesn't really solve the underlying problem.  There are likely other
> > > > > printks from zone->lock. It would be much more saner to actually
> > > > > disallow consoles to allocate any memory while printk is called from an
> > > > > atomic context.
> > > > 
> > > > No, there is only a handful of places called printk() from
> > > > zone->lock. It is normal that the callers will quietly process
> > > > “struct zone” modification in a short section with zone->lock
> > > > held.
> > > 
> > > It is extremely error prone to have any zone->lock vs. printk
> > > dependency. I do not want to play an endless whack a mole.
> > > 
> > > > No, it is not about “allocate any memory while printk is called from an
> > > > atomic context”. It is opposite lock chain  from different processors which has the same effect. For example,
> > > > 
> > > > CPU0:                 CPU1:         CPU2:
> > > > console_owner
> > > >                             sclp_lock
> > > > sclp_lock                                 zone_lock
> > > >                             zone_lock
> > > >                                                  console_owner
> > > 
> > > Why would sclp_lock ever take a zone->lock (apart from an allocation).
> > > So really if sclp_lock is a lock that might be taken from many contexts
> > > and generate very subtle lock dependencies then it should better be
> > > really careful what it is calling into.
> > > 
> > > In other words you are trying to fix a wrong end of the problem. Fix the
> > > console to not allocate or depend on MM by other means.
> > 
> > It looks there are way too many places that could generate those indirect lock
> > chains that are hard to eliminate them all. Here is anther example, where it
> > has,
> 
> Yeah and I strongly suspect they are consoles which are broken and need
> to be fixed rathert than the problem papered over.
> 
> I do realize how tempting it is to remove all printks from the
> zone->lock but do realize that as soon as the allocator starts using any
> other locks then we are back to square one and the problem is there
> again. We would have to drop _all_ printks from any locked section in
> the allocator and I do not think this is viable.
> 
> Really, the only way forward is to make these consoles be more careful
> of external dependencies.

Even with the new printk() Petr proposed. There is no guarantee it will fix it
properly. It looks like just reduce the chance of this kind of deadlocks as it
may or may not call wake_up_klogd() in vprintk_emit() depends on timing.

zone->lock
printk_deferred()
  vprintk_emit()
    wake_up_klogd()
      wake_up_klogd_work_func()
        console_unlock()

which essentially went into the same path,

zone_lock -> console_owner_lock

What else options it has here?

> 
> I am also wondering, this code is there for a long time (or is there any
> recent change?), why are we seeing reports only now? Are those consoles
> rarely used or you are simply luck to hit those? Or are those really
> representing a deadlock? Maybe the lockdep is just confused? I am not
> familiar with the code but console_owner_lock is doing some complex
> stuff to hand over the context.

As I mentioned in the changelog that almost nobody call printk() with zone->lock 
held except in memory offline.

"The problem is probably there forever, but neither many developers will
run memory offline with the lockdep enabled nor admins in the field are
lucky enough yet to hit a perfect timing which required to trigger a
real deadlock. In addition, there aren't many places that call printk()
while zone->lock was held."

> 
> > console_owner -> port_lock
> > port_lock -> zone_lock
> > 
> > [  297.425922] -> #3 (&(&zone->lock)->rlock){-.-.}:
> > [  297.425925]        __lock_acquire+0x5b3/0xb40
> > [  297.425925]        lock_acquire+0x126/0x280
> > [  297.425926]        _raw_spin_lock+0x2f/0x40
> > [  297.425927]        rmqueue_bulk.constprop.21+0xb6/0x1160
> > [  297.425928]        get_page_from_freelist+0x898/0x22c0
> > [  297.425928]        __alloc_pages_nodemask+0x2f3/0x1cd0
> > [  297.425929]        alloc_pages_current+0x9c/0x110
> > [  297.425930]        allocate_slab+0x4c6/0x19c0
> > [  297.425931]        new_slab+0x46/0x70
> > [  297.425931]        ___slab_alloc+0x58b/0x960
> > [  297.425932]        __slab_alloc+0x43/0x70
> > [  297.425933]        __kmalloc+0x3ad/0x4b0
> > [  297.425933]        __tty_buffer_request_room+0x100/0x250
> > [  297.425934]        tty_insert_flip_string_fixed_flag+0x67/0x110
> > [  297.425935]        pty_write+0xa2/0xf0
> > [  297.425936]        n_tty_write+0x36b/0x7b0
> > [  297.425936]        tty_write+0x284/0x4c0
> > [  297.425937]        __vfs_write+0x50/0xa0
> > [  297.425938]        vfs_write+0x105/0x290
> > [  297.425939]        redirected_tty_write+0x6a/0xc0
> > [  297.425939]        do_iter_write+0x248/0x2a0
> > [  297.425940]        vfs_writev+0x106/0x1e0
> > [  297.425941]        do_writev+0xd4/0x180
> > [  297.425941]        __x64_sys_writev+0x45/0x50
> > [  297.425942]        do_syscall_64+0xcc/0x76c
> > [  297.425943]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ