[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <13dbe818-a364-4cd4-3ac4-78bd7e8d28e3@amd.com>
Date: Tue, 19 Feb 2019 21:47:08 +0000
From: "Lendacky, Thomas" <Thomas.Lendacky@....com>
To: Thomas Gleixner <tglx@...utronix.de>,
Hans de Goede <hdegoede@...hat.com>
CC: Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
"Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
Borislav Petkov <bp@...en8.de>
Subject: Re: False positive "do_IRQ: #.55 No irq handler for vector" messages
on AMD ryzen based laptops
On 2/19/19 3:01 PM, Thomas Gleixner wrote:
> Hans,
>
> On Tue, 19 Feb 2019, Hans de Goede wrote:
>
> Cc+: ACPI/AMD folks
>
>> Various people are reporting false positive "do_IRQ: #.55 No irq handler for
>> vector"
>> messages on AMD ryzen based laptops, see e.g.:
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=1551605
>>
>> Which contains this dmesg snippet:
>>
>> Feb 07 20:14:29 localhost.localdomain kernel: smp: Bringing up secondary CPUs
>> ...
>> Feb 07 20:14:29 localhost.localdomain kernel: x86: Booting SMP configuration:
>> Feb 07 20:14:29 localhost.localdomain kernel: .... node #0, CPUs: #1
>> Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 1.55 No irq handler for
>> vector
>> Feb 07 20:14:29 localhost.localdomain kernel: #2
>> Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 2.55 No irq handler for
>> vector
>> Feb 07 20:14:29 localhost.localdomain kernel: #3
>> Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 3.55 No irq handler for
>> vector
>> Feb 07 20:14:29 localhost.localdomain kernel: smp: Brought up 1 node, 4 CPUs
>> Feb 07 20:14:29 localhost.localdomain kernel: smpboot: Max logical packages: 1
>> Feb 07 20:14:29 localhost.localdomain kernel: smpboot: Total of 4 processors
>> activated (15968.49 BogoMIPS)
>>
>> It seems that we get an IRQ for each CPU as we bring it online,
>> which feels to me like it is some sorta false-positive.
>
> Sigh, that looks like BIOS value add again.
>
> It's not a false positive. Something _IS_ sending a vector 55 to these CPUs
> for whatever reason.
>
I remember seeing something like this in the past and it turned out to be
a BIOS issue. BIOS was enabling the APs to interact with the legacy 8259
interrupt controller when only the BSP should. During POST the APs were
exposed to ExtINT/INTR events as a result of the mis-configuration
(probably due to a UEFI timer-tick using the 8259) and this left a pending
ExtINT/INTR interrupt latched on the APs.
When the APs were started by the OS, the latched ExtINT/INTR interrupt is
processed shortly after the OS enables interrupts. The AP then queries the
8259 to identify the vector number (which is the value of the 8259's ICW2
register + the IRQ level). The master 8259's ICW2 was set to 0x30 and,
since no interrupts are actually pending, the 8259 will respond with IRQ7
(spurious interrupt) yielding a vector of 0x37 or 55.
The OS was not expecting vector 55 and printed the message.
From the Intel Developer's Manual: Vol 3a, Section 10.5.1:
"Only one processor in the system should have an LVT entry configured to
use the ExtINT delivery mode."
Not saying this is the problem, but very well could be.
Thanks,
Tom
>> I temporarily have access to a loaner laptop for a couple of weeks which shows
>> the same errors and I would like to fix this, but I don't really know how to
>> fix this.
>
> Can you please enable CONFIG_GENERIC_IRQ_DEBUGFS and dig in the files there
> whether vector 55 is used on CPU0 and which device is associated to that.
>
> I bet its a legacy IRQ and as that space starts at 48 (IRQ0) this should be
> IRQ9 which is usually - DRUMROLL - the ACPI interrupt.
>
> The kernel clearly sets that up to be delivered to CPU 0 only, but I've
> seen that before that the BIOS value add thinks that this setup is not
> relevant.
>
> /me goes off and sings LALALA
>
>> Note if you want I can set up root ssh-access to the laptop.
>
> As a least resort. root ssh - SHUDDER - Ooops now I spilled my preferred
> password for that :)
>
> Thanks,
>
> tglx
>
Powered by blists - more mailing lists