linux-kernel - Re: [PATCH v2] x86/apic/vector: Move pr_warn() out of vector

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87tuoub07f.ffs@nanos.tec.linutronix.de>
Date:   Mon, 29 Mar 2021 14:42:28 +0200
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Waiman Long <longman@...hat.com>
Cc:     linux-kernel@...r.kernel.org, David Woodhouse <dwmw@...zon.co.uk>,
        Marc Zyngier <maz@...nel.org>, Ingo Molnar <mingo@...hat.com>,
        Borislav Petkov <bp@...en8.de>,
        "H. Peter Anvin" <hpa@...or.com>, Petr Mladek <pmladek@...e.com>,
        Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
        Andy Shevchenko <andy.shevchenko@...il.com>, x86@...nel.org,
        John Ogness <john.ogness@...utronix.de>
Subject: Re: [PATCH v2] x86/apic/vector: Move pr_warn() out of vector_lock

Waiman,

On Sun, Mar 28 2021 at 20:52, Waiman Long wrote:
> It was found that the following circular locking dependency warning
> could happen in some systems:
>
> [  218.097878] ======================================================
> [  218.097879] WARNING: possible circular locking dependency detected
> [  218.097880] 4.18.0-228.el8.x86_64+debug #1 Not tainted

Reports have to be against latest mainline and not against the random
distro frankenkernel of the day. That's nothing new.

Plus I was asking you to provide a full splat to look at so this can be
discussed _upfront_. Oh well...

> [  218.097914] -> #2 (&irq_desc_lock_class){-.-.}:
> [  218.097917]        _raw_spin_lock_irqsave+0x48/0x81
> [  218.097918]        __irq_get_desc_lock+0xcf/0x140
> [  218.097919]        __dble_irq_nosync+0x6e/0x110

This function does not even exist in mainline and never existed...

> [  218.097967]
> [  218.097967] Chain exists of:
> [  218.097968]   console_oc_lock_class --> vector_lock
> [  218.097972]
> [  218.097973]  Possible unsafe locking scenario:
> [  218.097973]
> [  218.097974]        CPU0                    CPU1
> [  218.097975]        ----                    ----
> [  218.097975]   lock(vector_lock);
> [  218.097977]                                lock(&irq_desc_lock_class);
> [  218.097980]                                lock(vector_lock);
> [  218.097981]   lock(console_owner);
> [  218.097983]
> [  218.097984]  *** DEADLOCK ***
> [  218.097984]
> [  218.097985] 6 locks held by systemd/1:
> [  218.097986]  #0: ffff88822b5cc1e8 (&tty->legacy_mutex){+.+.}, at: tty_init_dev+0x79/0x440
> [  218.097989]  #1: ffff88832ee00770 (&port->mutex){+.+.}, at: tty_port_open+0x85/0x190
> [  218.097993]  #2: ffff88813be85a88 (&desc->request_mutex){+.+.}, at: __setup_irq+0x249/0x1e60
> [  218.097996]  #3: ffff88813be858c0 (&irq_desc_lock_class){-.-.}, at: __setup_irq+0x2d9/0x1e60
> [  218.098000]  #4: ffffffff84afca78 (vector_lock){-.-.}, at: x86_vector_activate+0xca/0xab0
> [  218.098003]  #5: ffffffff84c27e20 (console_lock){+.+.}, at: vprintk_emit+0x13a/0x450

This is a more fundamental problem than just vector lock and the same
problem exists with any other printk over serial which is nested in the
interrupt activation chain not only on X86.

> -static int activate_reserved(struct irq_data *irqd)
> +static int activate_reserved(struct irq_data *irqd, char *wbuf, size_t wsize)
>  {

...

>  	if (!cpumask_subset(irq_data_get_effective_affinity_mask(irqd),
>  			    irq_data_get_affinity_mask(irqd))) {
> -		pr_warn("irq %u: Affinity broken due to vector space exhaustion.\n",
> -			irqd->irq);
> +		snprintf(wbuf, wsize, KERN_WARNING
> +			 "irq %u: Affinity broken due to vector space exhaustion.\n",
> +			 irqd->irq);

This is not really any more tasteful than the previous one and it does
not fix the fundamental underlying problem.

But, because I'm curious and printk is a constant source of trouble, I
just added unconditional pr_warns into those functions under vector_lock
on 5.12-rc5.

Still waiting for the lockdep splat to show up while enjoying the
trickle of printks over serial.

If you really think this is an upstream problem then please provide a
corresponding lockdep splat on plain 5.12-rc5 along with a .config and
the scenario which triggers this. Not less, not more.

Thanks,

        tglx