linux-kernel - Re: False positive "do_IRQ: #.55 No irq handler for vector" messages on AMD ryzen based laptops

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <f73c1410-db9c-deb7-6a23-5d54e2b9dcc7@redhat.com>
Date:   Thu, 7 Mar 2019 12:20:57 +0100
From:   Hans de Goede <hdegoede@...hat.com>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     Borislav Petkov <bp@...en8.de>,
        "Lendacky, Thomas" <Thomas.Lendacky@....com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        "Rafael J. Wysocki" <rafael.j.wysocki@...el.com>
Subject: Re: False positive "do_IRQ: #.55 No irq handler for vector" messages
 on AMD ryzen based laptops

Hi,

On 06-03-19 11:14, Thomas Gleixner wrote:
> Hans,
> 
> On Wed, 6 Mar 2019, Hans de Goede wrote:
>> On 05-03-19 20:54, Borislav Petkov wrote:
>>> On Tue, Mar 05, 2019 at 08:40:02PM +0100, Hans de Goede wrote:
>>>> Finger pointing at the firmware if there are multiple vendors involved
>>>> is really not going to help here. Esp. since most OEMs will just respond
>>>> with "the machine works fine with Windows"
>>>
>>> Yes, because windoze simply doesn't report that spurious IRQ, most
>>> likely.
>>
>> So maybe we need to lower the priority of the do_IRQ error from pr_emerg
>> to pr_err then ?  That will stop throwing the errors in the users face each
>> boot on distros which have chosen to set the quiet loglevel to such a level
>> that pr_err messages are not shown on the console (*).
> 
> Well, we rather try to understand and fix the issue.
> 
> So if Tom's theory holds, then the patch below should cure it.

Thank you for the patch, unfortunately the messages still happen
with a kernel with the patch applied:

[    0.741479] smp: Bringing up secondary CPUs ...
[    0.741654] x86: Booting SMP configuration:
[    0.741655] .... node  #0, CPUs:        #1
[    0.742231] TSC synchronization [CPU#0 -> CPU#1]:
[    0.742231] Measured 3346474670 cycles TSC warp between CPUs, turning off TSC
  clock.
[    0.742231] tsc: Marking TSC unstable due to check_tsc_sync_source failed
[    0.321639] do_IRQ: 1.55 No irq handler for vector
[    0.743371]   #2
[    0.321639] do_IRQ: 2.55 No irq handler for vector
[    0.743598]   #3
[    0.321639] do_IRQ: 3.55 No irq handler for vector
[    0.744306]   #4
[    0.321639] do_IRQ: 4.55 No irq handler for vector
[    0.744531]   #5
[    0.321639] do_IRQ: 5.55 No irq handler for vector
[    0.745241]   #6
[    0.321639] do_IRQ: 6.55 No irq handler for vector
[    0.745467]   #7
[    0.321639] do_IRQ: 7.55 No irq handler for vector
[    0.745627] smp: Brought up 1 node, 8 CPUs
[    0.745627] smpboot: Max logical packages: 2
[    0.745627] smpboot: Total of 8 processors activated (35133.37 BogoMIPS)

I also tried suspend/resume. In that case there are no
extra "No irq handler for vector" printed, this seems to
only trigger once per CPU on boot only.

I do get these messages during resume, but I guess these are unrelated:

[  167.034247] ACPI: Low-level resume complete
[  167.034247] ACPI: EC: EC started
[  167.034247] PM: Restoring platform NVS memory
[  167.034247] Enabling non-boot CPUs ...
[  167.034247] x86: Booting SMP configuration:
[  167.034247] smpboot: Booting Node 0 Processor 1 APIC 0x1
[  167.034247]  cache: parent cpu1 should not be sleeping
[  167.034281] microcode: CPU1: patch_level=0x08101007
[  167.034542] CPU1 is up
[  167.034583] smpboot: Booting Node 0 Processor 2 APIC 0x2
[  167.035347]  cache: parent cpu2 should not be sleeping
[  167.035484] microcode: CPU2: patch_level=0x08101007
[  167.035690] CPU2 is up
[  167.035703] smpboot: Booting Node 0 Processor 3 APIC 0x3
[  167.036447]  cache: parent cpu3 should not be sleeping
[  167.036580] microcode: CPU3: patch_level=0x08101007
[  167.036819] CPU3 is up
[  167.036843] smpboot: Booting Node 0 Processor 4 APIC 0x4
[  167.038227]  cache: parent cpu4 should not be sleeping
[  167.038384] microcode: CPU4: patch_level=0x08101007
etc.

Regards,

Hans


> 8<---------------------
> 
> --- a/arch/x86/kernel/apic/apic.c
> +++ b/arch/x86/kernel/apic/apic.c
> @@ -1642,6 +1642,7 @@ static void end_local_APIC_setup(void)
>    */
>   void apic_ap_setup(void)
>   {
> +	clear_local_APIC();
>   	setup_local_APIC();
>   	end_local_APIC_setup();
>   }
>