linux-kernel - Re: [PATCH v2] mm/page_isolation: fix a deadlock with printk()

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1570626402.5937.1.camel@lca.pw>
Date:   Wed, 09 Oct 2019 09:06:42 -0400
From:   Qian Cai <cai@....pw>
To:     Petr Mladek <pmladek@...e.com>
Cc:     Michal Hocko <mhocko@...nel.org>,
        Christian Borntraeger <borntraeger@...ibm.com>,
        Heiko Carstens <heiko.carstens@...ibm.com>,
        sergey.senozhatsky.work@...il.com, rostedt@...dmis.org,
        peterz@...radead.org, linux-mm@...ck.org,
        john.ogness@...utronix.de, akpm@...ux-foundation.org,
        Vasily Gorbik <gor@...ux.ibm.com>,
        Peter Oberparleiter <oberpar@...ux.ibm.com>, david@...hat.com,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] mm/page_isolation: fix a deadlock with printk()

On Wed, 2019-10-09 at 13:49 +0200, Petr Mladek wrote:
> On Tue 2019-10-08 15:35:24, Qian Cai wrote:
> > On Tue, 2019-10-08 at 21:17 +0200, Michal Hocko wrote:
> > > On Tue 08-10-19 15:06:13, Qian Cai wrote:
> > > > On Tue, 2019-10-08 at 20:35 +0200, Michal Hocko wrote:
> > > 
> > > [...]
> > > > > I fully agree that this class of lockdep splats are annoying especially
> > > > > when they make the lockdep unusable but please discuss this with lockdep
> > > > > maintainers and try to find some solution rather than go and try to
> > > > > workaround the problem all over the place. If there are places that
> > > > > would result in a cleaner code then go for it but please do not make the
> > > > > code worse just because of a non existent problem flagged by a false
> > > > > positive.
> > > > 
> > > > It makes me wonder what make you think it is a false positive for sure.
> > > 
> > > Because this is an early init code? Because if it were a real deadlock
> > 
> > No, that alone does not prove it is a false positive. There are many places
> > could generate that lock dependency but lockdep will always save the earliest
> > one.
> 
> You are still mixing the pasted lockdep report and other situations
> that will not get reported because of the first one.

The lockdep report is designed to only just give a clue on the existing locking
dependency. Then, it is normal that developers need to audit all places of that
particular locking dependency to confirm it is a true false positive.

> 
> We believe that the pasted report is pasted is false positive. And we
> are against complicating the code just to avoid this false positive.

This is totally incorrect as above and there is even another similar example of
the splat during memory offline I mentioned earlier,

https://lore.kernel.org/linux-mm/1570460350.5576.290.camel@lca.pw/

[  297.425964] -> #1 (&port_lock_key){-.-.}:
[  297.425967]        __lock_acquire+0x5b3/0xb40
[  297.425967]        lock_acquire+0x126/0x280
[  297.425968]        _raw_spin_lock_irqsave+0x3a/0x50
[  297.425969]        serial8250_console_write+0x3e4/0x450
[  297.425970]        univ8250_console_write+0x4b/0x60
[  297.425970]        console_unlock+0x501/0x750
[  297.425971]        vprintk_emit+0x10d/0x340
[  297.425972]        vprintk_default+0x1f/0x30
[  297.425972]        vprintk_func+0x44/0xd4
[  297.425973]        printk+0x9f/0xc5
[  297.425974]        register_console+0x39c/0x520
[  297.425975]        univ8250_console_init+0x23/0x2d
[  297.425975]        console_init+0x338/0x4cd
[  297.425976]        start_kernel+0x534/0x724
[  297.425977]        x86_64_start_reservations+0x24/0x26
[  297.425977]        x86_64_start_kernel+0xf4/0xfb
[  297.425978]        secondary_startup_64+0xb6/0xc0

where the report again show the early boot call trace for the locking
dependency,

console_owner --> port_lock_key

but that dependency clearly not only happen in the early boot.

I agree with you that it is hard to hit because it needs 4 CPUs to hit the exact
the same spot.

> 
> Are you sure that the workaround will not create real deadlocks
> or races?
> 
> I see two realistic possibilities:
> 
>    + Make printk() always deferred. This will hopefully happen soon.
> 
>    + Improve lockdep so that false positives could get ignored
>      without complicating the code.

There are certainly rooms to improve the lockdep but it does not help in this
case as mentioned above.