[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87lfeiiy10.fsf@nanos.tec.linutronix.de>
Date: Mon, 30 Nov 2020 17:56:27 +0100
From: Thomas Gleixner <tglx@...utronix.de>
To: Laurențiu Nicola <lnicola@...d.ro>
Cc: mingo@...nel.org, bp@...en8.de, x86@...nel.org, trivial@...nel.org,
LKML <linux-kernel@...r.kernel.org>,
Tom Lendacky <thomas.lendacky@....com>
Subject: Re: [PATCH] x86/irq: Lower unhandled irq error severity
Laurentiu,
On Fri, Nov 27 2020 at 10:03, Laurențiu Nicola wrote:
> On Fri, Nov 27, 2020, at 02:12, Thomas Gleixner wrote:
>> On Thu, Nov 26 2020 at 09:47, Laurențiu Nicola wrote:
>> > These messages are described as warnings in the MSI code.
>>
>> Where and what has MSI to do with these messages?
>
> There's a comment referring to it as a warning, but an error seemed a more appropriate severity:
>
> * If the vector is unused, then it is marked so it won't
> * trigger the 'No irq handler for vector' warning in
> * common_interrupt().
That's a description for the logic in the MSI code which is required to
_NOT_ trigger the 'No irq handler' message. If that message appears then
something _is_ badly wrong. Either the kernel screwed up or something in
the BIOS/firmware/hardware is bonkers.
>> > Spotted because they break quiet boot on a Ryzen 5000 CPU.
>>
>> They don't break the boot.
>>
>> The machine boots fine, but having interrupts raised on a vector which
>> is unused is really bad.
>
> That's right, sorry. It still boots, but it's no longer "quiet",
> that's what I meant.
Right, but surpressing that is not a solution.
>> Can you please provide the actual message from dmesg?
>
> Sure:
>
> [ 0.316902] __common_interrupt: 1.55 No irq handler for vector
> [ 0.316902] __common_interrupt: 2.55 No irq handler for vector
> [ 0.316902] __common_interrupt: 3.55 No irq handler for vector
> [ 0.316902] __common_interrupt: 4.55 No irq handler for vector
> [ 0.316902] __common_interrupt: 5.55 No irq handler for vector
> [ 0.316902] __common_interrupt: 6.55 No irq handler for vector
> [ 0.316902] __common_interrupt: 7.55 No irq handler for vector
> [ 0.316902] __common_interrupt: 8.55 No irq handler for vector
> [ 0.316902] __common_interrupt: 9.55 No irq handler for vector
> [ 0.316902] __common_interrupt: 10.55 No irq handler for vector
>
> These only show up during boot (and not e.g. when a disabling and enabling again a CPU).
That's the AMD plague which is known for quite some time and it's pretty
much confirmed that it is a BIOS/firmware issue.
I don't know whether AMD has figured it out and told their OEMs what to
do about that or whether the OEMs just ignore it because windows ignores
it or is not affected for whatever reason.
Thanks,
tglx
Powered by blists - more mailing lists