[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8e1b9002-aa2c-5314-54f6-d5156703e25d@oracle.com>
Date: Mon, 24 May 2021 13:29:22 +1000
From: imran.f.khan@...cle.com
To: Thomas Gleixner <tglx@...utronix.de>, mingo@...hat.com,
bp@...en8.de
Cc: x86@...nel.org, hpa@...or.com, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH] x86/apic: Fix BUG due to multiple allocation of
legacy vectors.
On 20/5/21 6:17 pm, Thomas Gleixner wrote:
> Imran,
Hi Thomas,
Thanks for the review.
>
> On Wed, May 19 2021 at 23:39, Imran Khan wrote:
[...]
>>
>> [ 154.738226] kernel BUG at arch/x86/kernel/apic/vector.c:172!
>
> please trim the backtrace. It's not really relevant for understanding
> the problem.
>
Noted.
>> This patch marks these legacy vectors as assigned in irq_matrix
>
> git grep 'This patch' Documentation/process/
>
Noted.
>> so that corresponding bits in percpu bitmaps get set and these
>> legacy vectors don't get reallocted.
>
> This is just wrong.
>
> True legacy interrupts (PIC delivery) are marked as system vectors. See
> lapic_assign_legacy_vector(). That prevents them from being allocated.
>
Taking current ML tag v5.13-rc2, I can see lapic_assign_legacy_vector
getting invoked only for vector 2 (PIC_CASCADE_IR).
I do see that lapic_assign_system_vectors is assigning legacy
interrupts, but this is invoked only for boot CPU from native_init_IRQ.
>> [ 154.858092] CPU: 22 PID: 3569 Comm: ifup-eth Not tainted 5.8.0-20200716.x86_64 #1
>
> I have no idea what this 5.8.0-magic-date kernel is.
>
> Have you verified that this problem exists with upstream?
>
I have not yet tested current mainline tag (v5.13-rc2)
on this setup because of some pending porting stuff.
But I tested ML v5.13-rc2 on qemu based x86_64 setup (4 CPUs) and
observed some difference in system vector assignments depending
on whether kernel is booted with or without noapic option.
If kernel is booted with option noapic, the io_apic_irqs
bitmap is not set by setup_IO_APIC and this happens because
skip_ioapic_setup is found as set in this case. But in absence
of noapic kernel parameters, io_apic_irqs bitmap gets set by setup_IO_APIC.
Now in case of booting with noapic option, the check "test_bit(isairq,
&io_apic_irqs))" in __setup_vector_irq will fail for system vectors
and __setup_vector_irq will end up returning corresponding irq descriptor.
This irq descriptor gets assigned in per cpu vector_irq corresponding
to secondary CPUs.
But the corresponding bitmap in irq_matrix still remains unset,
because invocation of lapic_assign_system_vectors via native_init_IRQ
happens only for boot CPU.
As a result of this if any of these vectors get allocated for secondary
CPUs kernel will hit the BUG condition given in apic_update_vector.
Even my current setup that was crashing with in-house 5.4 and 5.8
kernels, boots fine if I boot it with noapic option removed from kernel
boot parameters.
So even though I have not yet tested current ML tag on my test setup, It
looks to me that cause of my problem is there in current ML tag as well
and gets manifested when kernel is booted with noapic option.
Please let me know if I am missing something. If you need additional
data (vector traces, debugfs content for VECTOR domain etc.) from my
qemu based experiment, I can provide that as well.
Thanks,
Imran
> Thanks,
>
> tglx
>
Powered by blists - more mailing lists