linux-kernel - Re: [PATCH v2] x86/apic/vector: Move pr_warn() out of vector

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ce7250cb-2b11-7d83-56b0-00f4f6274dae@redhat.com>
Date:   Mon, 29 Mar 2021 15:57:20 -0400
From:   Waiman Long <longman@...hat.com>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     linux-kernel@...r.kernel.org, David Woodhouse <dwmw@...zon.co.uk>,
        Marc Zyngier <maz@...nel.org>, Ingo Molnar <mingo@...hat.com>,
        Borislav Petkov <bp@...en8.de>,
        "H. Peter Anvin" <hpa@...or.com>, Petr Mladek <pmladek@...e.com>,
        Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
        Andy Shevchenko <andy.shevchenko@...il.com>, x86@...nel.org,
        John Ogness <john.ogness@...utronix.de>
Subject: Re: [PATCH v2] x86/apic/vector: Move pr_warn() out of vector_lock

On 3/29/21 8:42 AM, Thomas Gleixner wrote:
> Waiman,
>
> On Sun, Mar 28 2021 at 20:52, Waiman Long wrote:
>> It was found that the following circular locking dependency warning
>> could happen in some systems:
>>
>> [  218.097878] ======================================================
>> [  218.097879] WARNING: possible circular locking dependency detected
>> [  218.097880] 4.18.0-228.el8.x86_64+debug #1 Not tainted
> Reports have to be against latest mainline and not against the random
> distro frankenkernel of the day. That's nothing new.
>
> Plus I was asking you to provide a full splat to look at so this can be
> discussed _upfront_. Oh well...

That was the full splat that I can see except the following trailing data:

[  218.098064] RIP: 0033:0x7ff4ee7620d6
[  218.098066] Code: 89 54 24 08 e8 9b f4 ff ff 8b 74 24 0c 48 8b 3c 24 
41 89 c0 44 8b 54 24 08 b8 01 01 00 00 89 f2 48 89 fe bf 9c ff ff ff 0f 
05 <48> 3d 00 f0 ff ff 77 30 44 89 c7 89 44 24 08 e8 c6 f4 ff ff 8b 44
[  218.098067] RSP: 002b:00007ffdda1116a0 EFLAGS: 00000293 ORIG_RAX: 
0000000000000101
[  218.098069] RAX: ffffffffffffffda RBX: 0000564733953f70 RCX: 
00007ff4ee7620d6
[  218.098070] RDX: 0000000000080902 RSI: 00005647339560e0 RDI: 
00000000ffffff9c
[  218.098071] RBP: 0000000000000015 R08: 0000000000000000 R09: 
000000000000004b
[  218.098072] R10: 0000000000000000 R11: 0000000000000293 R12: 
0000000000080902
[  218.098074] R13: 00005647339560e0 R14: 0000564733953a90 R15: 
0000000000000002
[  218.098460] irq 3: Affinity broken due to vector space exhaustion.


>> [  218.097914] -> #2 (&irq_desc_lock_class){-.-.}:
>> [  218.097917]        _raw_spin_lock_irqsave+0x48/0x81
>> [  218.097918]        __irq_get_desc_lock+0xcf/0x140
>> [  218.097919]        __dble_irq_nosync+0x6e/0x110
> This function does not even exist in mainline and never existed...
>
>> [  218.097967]
>> [  218.097967] Chain exists of:
>> [  218.097968]   console_oc_lock_class --> vector_lock
>> [  218.097972]
>> [  218.097973]  Possible unsafe locking scenario:
>> [  218.097973]
>> [  218.097974]        CPU0                    CPU1
>> [  218.097975]        ----                    ----
>> [  218.097975]   lock(vector_lock);
>> [  218.097977]                                lock(&irq_desc_lock_class);
>> [  218.097980]                                lock(vector_lock);
>> [  218.097981]   lock(console_owner);
>> [  218.097983]
>> [  218.097984]  *** DEADLOCK ***
>> [  218.097984]
>> [  218.097985] 6 locks held by systemd/1:
>> [  218.097986]  #0: ffff88822b5cc1e8 (&tty->legacy_mutex){+.+.}, at: tty_init_dev+0x79/0x440
>> [  218.097989]  #1: ffff88832ee00770 (&port->mutex){+.+.}, at: tty_port_open+0x85/0x190
>> [  218.097993]  #2: ffff88813be85a88 (&desc->request_mutex){+.+.}, at: __setup_irq+0x249/0x1e60
>> [  218.097996]  #3: ffff88813be858c0 (&irq_desc_lock_class){-.-.}, at: __setup_irq+0x2d9/0x1e60
>> [  218.098000]  #4: ffffffff84afca78 (vector_lock){-.-.}, at: x86_vector_activate+0xca/0xab0
>> [  218.098003]  #5: ffffffff84c27e20 (console_lock){+.+.}, at: vprintk_emit+0x13a/0x450
> This is a more fundamental problem than just vector lock and the same
> problem exists with any other printk over serial which is nested in the
> interrupt activation chain not only on X86.

That is true. This problem is more generic than just that. I am hoping 
that the new printk rewrite may address this problem. I have been 
waiting for a while and that work is still not upstream yet. So what is 
your current timeline for that? If that will happen soon, I probably 
don't need this patch. I send this patch out as I am uncertain about it.


>> -static int activate_reserved(struct irq_data *irqd)
>> +static int activate_reserved(struct irq_data *irqd, char *wbuf, size_t wsize)
>>   {
> ...
>
>>   	if (!cpumask_subset(irq_data_get_effective_affinity_mask(irqd),
>>   			    irq_data_get_affinity_mask(irqd))) {
>> -		pr_warn("irq %u: Affinity broken due to vector space exhaustion.\n",
>> -			irqd->irq);
>> +		snprintf(wbuf, wsize, KERN_WARNING
>> +			 "irq %u: Affinity broken due to vector space exhaustion.\n",
>> +			 irqd->irq);
> This is not really any more tasteful than the previous one and it does
> not fix the fundamental underlying problem.
>
> But, because I'm curious and printk is a constant source of trouble, I
> just added unconditional pr_warns into those functions under vector_lock
> on 5.12-rc5.
>
> Still waiting for the lockdep splat to show up while enjoying the
> trickle of printks over serial.
>
> If you really think this is an upstream problem then please provide a
> corresponding lockdep splat on plain 5.12-rc5 along with a .config and
> the scenario which triggers this. Not less, not more.

I will try to reproduce this problem with an upstream kernel.

Thanks,
Longman