lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20191008084031.GC6681@dhcp22.suse.cz>
Date:   Tue, 8 Oct 2019 10:40:31 +0200
From:   Michal Hocko <mhocko@...nel.org>
To:     Qian Cai <cai@....pw>
Cc:     Petr Mladek <pmladek@...e.com>, akpm@...ux-foundation.org,
        sergey.senozhatsky.work@...il.com, rostedt@...dmis.org,
        peterz@...radead.org, linux-mm@...ck.org,
        john.ogness@...utronix.de, david@...hat.com,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] mm/page_isolation: fix a deadlock with printk()

On Mon 07-10-19 11:33:27, Qian Cai wrote:
> On Mon, 2019-10-07 at 17:12 +0200, Michal Hocko wrote:
> > On Mon 07-10-19 10:59:10, Qian Cai wrote:
> > [...]
> > > It is almost impossible to eliminate all the indirect call chains from
> > > console_sem/console_owner_lock to zone->lock because it is too normal that
> > > something later needs to allocate some memory dynamically, so as long as it
> > > directly call printk() with zone->lock held, it will be in trouble.
> > 
> > Do you have any example where the console driver really _has_ to
> > allocate. Because I have hard time to believe this is going to work at
> > all as the atomic context doesn't allow to do any memory reclaim and
> > such an allocation would be too easy to fail so the allocation cannot
> > really rely on it.
> 
> I don't know how to explain to you clearly, but let me repeat again one last
> time. There is no necessary for console driver directly to allocate considering
> this example,
> 
> CPU0:              CPU1:    CPU2:       CPU3:
> console_sem->lock                       zone->lock
>                    pi->lock
> pi->lock                    rq_lock
>                    rq->lock
>                             zone->lock
>                                         console_sem->lock
> 
> Here it only need someone held the rq_lock and allocate some memory.

Is the scheduler really allocating while holding the rq lock?

> There is
> also true for port_lock. Since the deadlock could involve a lot of CPUs and a
> longer lock chain, it is impossible to predict which one to allocate some memory
> while held a lock could end up with the same problematic lock chain.

And that is exactly what I've said earlier. Locks used by consoles
should really better be tail locks because then they are going to create
arbitrary dependency chains. The zone->lock is in no way special here.
 
> > So again, crippling the MM code just because of lockdep false possitives
> > or a broken console driver sounds like a wrong way to approach the
> > problem.
> > 
> > > [  297.425964] -> #1 (&port_lock_key){-.-.}:
> > > [  297.425967]        __lock_acquire+0x5b3/0xb40
> > > [  297.425967]        lock_acquire+0x126/0x280
> > > [  297.425968]        _raw_spin_lock_irqsave+0x3a/0x50
> > > [  297.425969]        serial8250_console_write+0x3e4/0x450
> > > [  297.425970]        univ8250_console_write+0x4b/0x60
> > > [  297.425970]        console_unlock+0x501/0x750
> > > [  297.425971]        vprintk_emit+0x10d/0x340
> > > [  297.425972]        vprintk_default+0x1f/0x30
> > > [  297.425972]        vprintk_func+0x44/0xd4
> > > [  297.425973]        printk+0x9f/0xc5
> > > [  297.425974]        register_console+0x39c/0x520
> > > [  297.425975]        univ8250_console_init+0x23/0x2d
> > > [  297.425975]        console_init+0x338/0x4cd
> > > [  297.425976]        start_kernel+0x534/0x724
> > > [  297.425977]        x86_64_start_reservations+0x24/0x26
> > > [  297.425977]        x86_64_start_kernel+0xf4/0xfb
> > > [  297.425978]        secondary_startup_64+0xb6/0xc0
> > 
> > This is an early init code again so the lockdep sounds like a false
> > possitive to me.
> 
> This is just a tip of iceberg to show the lock dependency,

Does this tip point to a real deadlock or merely a class of lockdep
false dependencies?

> console_owner --> port_lock_key
> 
> which could easily happen everywhere with a simple printk().

-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ