lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8e1b9002-aa2c-5314-54f6-d5156703e25d@oracle.com>
Date:   Mon, 24 May 2021 13:29:22 +1000
From:   imran.f.khan@...cle.com
To:     Thomas Gleixner <tglx@...utronix.de>, mingo@...hat.com,
        bp@...en8.de
Cc:     x86@...nel.org, hpa@...or.com, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH] x86/apic: Fix BUG due to multiple allocation of
 legacy vectors.



On 20/5/21 6:17 pm, Thomas Gleixner wrote:
> Imran,

Hi Thomas,
Thanks for the review.
> 
> On Wed, May 19 2021 at 23:39, Imran Khan wrote:
[...]
>>
>> [  154.738226] kernel BUG at arch/x86/kernel/apic/vector.c:172!
> 
> please trim the backtrace. It's not really relevant for understanding
> the problem.
> 

Noted.

>> This patch marks these legacy vectors as assigned in irq_matrix
> 
> git grep 'This patch' Documentation/process/
> 

Noted.
>> so that corresponding bits in percpu bitmaps get set and these
>> legacy vectors don't get reallocted.
> 
> This is just wrong.
> 
> True legacy interrupts (PIC delivery) are marked as system vectors. See
> lapic_assign_legacy_vector(). That prevents them from being allocated.
> 

Taking current ML tag v5.13-rc2, I can see lapic_assign_legacy_vector 
getting invoked only for vector 2 (PIC_CASCADE_IR).
I do see that lapic_assign_system_vectors is assigning legacy 
interrupts, but this is invoked only for boot CPU from native_init_IRQ.

>> [  154.858092] CPU: 22 PID: 3569 Comm: ifup-eth Not tainted 5.8.0-20200716.x86_64 #1
> 
> I have no idea what this 5.8.0-magic-date kernel is.
> 
> Have you verified that this problem exists with upstream?
> 

I have not yet tested current mainline tag (v5.13-rc2)
on this setup because of some pending porting stuff.

But I tested ML v5.13-rc2 on qemu based x86_64 setup (4 CPUs) and
observed some difference in system vector assignments depending
on whether kernel is booted with or without noapic option.

If kernel is booted with option noapic, the io_apic_irqs
bitmap is not set by setup_IO_APIC and this happens because 
skip_ioapic_setup is found as set in this case. But in absence
of noapic kernel parameters, io_apic_irqs bitmap gets set by setup_IO_APIC.

Now in case of booting with noapic option, the check "test_bit(isairq, 
&io_apic_irqs))" in __setup_vector_irq will fail for system vectors
and __setup_vector_irq will end up returning corresponding irq descriptor.
This irq descriptor gets assigned in per cpu vector_irq corresponding
to secondary CPUs.

But the corresponding bitmap in irq_matrix still remains unset,
because invocation of lapic_assign_system_vectors via native_init_IRQ 
happens only for boot CPU.

As a result of this if any of these vectors get allocated for secondary 
CPUs kernel will hit the BUG condition given in apic_update_vector.

Even my current setup that was crashing with in-house 5.4 and 5.8 
kernels, boots fine if I boot it with noapic option removed from kernel 
boot parameters.

So even though I have not yet tested current ML tag on my test setup, It
looks to me that cause of my problem is there in current ML tag as well
and gets manifested when kernel is booted with noapic option.

Please let me know if I am missing something. If you need additional
data (vector traces, debugfs content for VECTOR domain etc.) from my 
qemu based experiment, I can provide that as well.

Thanks,
Imran

> Thanks,
> 
>          tglx
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ