lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <AM2PR05MB0689A3D0B3623EAFF27CC75B91D50@AM2PR05MB0689.eurprd05.prod.outlook.com>
Date:   Thu, 20 Oct 2016 15:36:32 +0000
From:   Christian Steiner <christian.steiner@...look.de>
To:     Mika Westerberg <mika.westerberg@...ux.intel.com>
CC:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "Linus Walleij" <linus.walleij@...aro.org>
Subject: Re: pinctrl_cherryview: massive error messages (handle_bad_irq)

On 20.10.2016 12:08, Mika Westerberg wrote:
> +LinusW
> 
> On Wed, Oct 19, 2016 at 08:40:25PM +0000, Christian Steiner wrote:
>> On 19.10.2016 21:22, Mika Westerberg wrote:
>>> On Wed, Oct 19, 2016 at 06:32:57PM +0000, Christian Steiner wrote:
>>>> Hello,
>>>>
>>>> since Kernel 4.8 I see a lot of the following error messages after
>>>> resuming from suspend to RAM:
>>>>
>>>>> irq 117, desc: ffff88017a61e600, depth: 1, count: 0, unhandled: 0
>>>>> ->handle_irq():  ffffffff8109b613, handle_bad_irq+0x0/0x1e0
>>>>> ->irq_data.chip(): ffffffffa0020180, chv_pinctrl_exit+0x2d84/0x12 [pinctrl_cherryview]
>>>>> ->action():           (null)
>>>>>    IRQ_NOPROBE set
>>>>
>>>> The messages appear a few thousand times and mess up the log. Despite
>>>> the messages everything seems to work fine.
>>>> I see the messages with Kernel 4.8.2 and 4.9-rc1. The do not appear with
>>>> Kernel 4.7.8 or earlier.
>>>
>>> We changed the driver not to mask all IRQs when it starts. See commit
>>> bcb48cca23ec ("pinctrl: cherryview: Do not mask all interrupts in
>>> probe").
>>>
>>>> I have an Acer TravelMate B116-M laptop with an Intel Pentium N3700
>>>> processor.
>>>>
>>>> Here are the full log messages of a suspend to RAM and resume cycle with
>>>> Kernel 4.7.8:
>>>>
>>>>> [   22.605418] PM: Syncing filesystems ... done.
>>>>> [   22.613640] PM: Preparing system for sleep (mem)
>>>>> [   22.615108] Freezing user space processes ... (elapsed 0.001 seconds) done.
>>>>> [   22.616930] Double checking all user space processes after OOM killer disable... (elapsed 0.000 seconds)
>>>>> [   22.616956] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
>>>>> [   22.618611] PM: Suspending system (mem)
>>>>> [   22.618640] Suspending console(s) (use no_console_suspend to debug)
>>>>> [   22.619773] sd 0:0:0:0: [sda] Synchronizing SCSI cache
>>>>> [   22.626173] sd 0:0:0:0: [sda] Stopping disk
>>>>> [   23.670076] PM: suspend of devices complete after 1051.008 msecs
>>>>> [   23.691058] PM: late suspend of devices complete after 20.971 msecs
>>>>> [   23.692104] r8169 0000:03:00.0: System wakeup enabled by ACPI
>>>>> [   23.693087] xhci_hcd 0000:00:14.0: System wakeup enabled by ACPI
>>>>> [   23.711687] PM: noirq suspend of devices complete after 20.624 msecs
>>>>> [   23.711726] ACPI: Preparing to enter system sleep state S3
>>>>> [   23.712295] ACPI : EC: EC stopped
>>>>> [   23.712297] PM: Saving platform NVS memory
>>>>> [   23.712306] Disabling non-boot CPUs ...
>>>>> [   23.712590] Cannot set affinity for irq 191
>>>>> [   23.713647] smpboot: CPU 1 is now offline
>>>>> [   23.752643] smpboot: CPU 2 is now offline
>>>>> [   23.802930] smpboot: CPU 3 is now offline
>>>>> [   23.831827] ACPI: Low-level resume complete
>>>>> [   23.831938] ACPI : EC: EC started
>>>>> [   23.831939] PM: Restoring platform NVS memory
>>>>> [   23.832600] Enabling non-boot CPUs ...
>>>>> [   23.882819] x86: Booting SMP configuration:
>>>>> [   23.882821] smpboot: Booting Node 0 Processor 1 APIC 0x2
>>>>> [   23.885308]  cache: parent cpu1 should not be sleeping
>>>>> [   23.885757] CPU1 is up
>>>>> [   23.932612] smpboot: Booting Node 0 Processor 2 APIC 0x4
>>>>> [   23.935171]  cache: parent cpu2 should not be sleeping
>>>>> [   23.935674] CPU2 is up
>>>>> [   23.982749] smpboot: Booting Node 0 Processor 3 APIC 0x6
>>>>> [   23.985324]  cache: parent cpu3 should not be sleeping
>>>>> [   23.985840] CPU3 is up
>>>>> [   23.986370] ACPI: Waking up from system sleep state S3
>>>>> [   24.012307] xhci_hcd 0000:00:14.0: System wakeup disabled by ACPI
>>>>> [   24.023417] PM: noirq resume of devices complete after 20.911 msecs
>>>>> [   25.052523] PM: early resume of devices complete after 1029.024 msecs
>>>>> [   25.053082] r8169 0000:03:00.0: System wakeup disabled by ACPI
>>>>> [   25.053124] iwlwifi 0000:02:00.0: RF_KILL bit toggled to disable radio.
>>>>> [   25.054222] rtc_cmos 00:04: System wakeup disabled by ACPI
>>>>> [   25.055645] sd 0:0:0:0: [sda] Starting disk
>>>>> [   25.075031] r8169 0000:03:00.0 eth0: link down
>>>>> [   25.374179] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
>>>>> [   25.377616] ata1.00: configured for UDMA/133
>>>>> [   26.025736] PM: resume of devices complete after 973.238 msecs
>>>>> [   26.026109] PM: Finishing wakeup.
>>>>> [   26.026111] Restarting tasks ... done.
>>>>> [   27.767611] r8169 0000:03:00.0 eth0: link up
>>>>
>>>> And this is how it looks with Kernel 4.8.2:
>>>>
>>>>> [   46.928231] PM: Syncing filesystems ... done.
>>>>> [   46.930391] PM: Preparing system for sleep (mem)
>>>>> [   46.932002] Freezing user space processes ... (elapsed 0.001 seconds) done.
>>>>> [   46.933721] Double checking all user space processes after OOM killer disable... (elapsed 0.000 seconds)
>>>>> [   46.933749] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
>>>>> [   46.935457] PM: Suspending system (mem)
>>>>> [   46.935486] Suspending console(s) (use no_console_suspend to debug)
>>>>> [   46.936492] sd 0:0:0:0: [sda] Synchronizing SCSI cache
>>>>> [   46.936598] sd 0:0:0:0: [sda] Stopping disk
>>>>> [   48.026353] PM: suspend of devices complete after 1090.448 msecs
>>>>> [   48.057844] PM: late suspend of devices complete after 31.480 msecs
>>>>> [   48.059086] r8169 0000:03:00.0: System wakeup enabled by ACPI
>>>>> [   48.059240] xhci_hcd 0000:00:14.0: System wakeup enabled by ACPI
>>>>> [   48.117234] PM: noirq suspend of devices complete after 59.386 msecs
>>>>> [   48.117269] ACPI: Preparing to enter system sleep state S3
>>>>> [   48.117836] ACPI : EC: EC stopped
>>>>> [   48.117837] PM: Saving platform NVS memory
>>>>> [   48.117846] Disabling non-boot CPUs ...
>>>>> [   48.118130] Cannot set affinity for irq 191
>>>>> [   48.119186] smpboot: CPU 1 is now offline
>>>>> [   48.179126] smpboot: CPU 2 is now offline
>>>>> [   48.229450] smpboot: CPU 3 is now offline
>>>>> [   48.278338] ACPI: Low-level resume complete
>>>>> [   48.278415] ACPI : EC: EC started
>>>>> [   48.278416] PM: Restoring platform NVS memory
>>>>> [   48.279036] irq 117, desc: ffff88017a61e600, depth: 1, count: 0, unhandled: 0
>>>>> [   48.279044] ->handle_irq():  ffffffff8109b613, handle_bad_irq+0x0/0x1e0
>>>>> [   48.279052] ->irq_data.chip(): ffffffffa0020180, chv_pinctrl_exit+0x2d84/0x12 [pinctrl_cherryview]
>>>>> [   48.279054] ->action():           (null)
>>>>> [   48.279055]    IRQ_NOPROBE set
>>>>> [   48.279057] unexpected IRQ trap at vector 75
>>>>>
>>>> [...]
>>>>>
>>>>> [   48.659033] irq 117, desc: ffff88017a61e600, depth: 1, count: 0, unhandled: 0
>>>>> [   48.659037] ->handle_irq():  ffffffff8109b613, handle_bad_irq+0x0/0x1e0
>>>>> [   48.659041] ->irq_data.chip(): ffffffffa0020180, chv_pinctrl_exit+0x2d84/0x12 [pinctrl_cherryview]
>>>>> [   48.659042] ->action():           (null)
>>>>> [   48.659043]    IRQ_NOPROBE set
>>>>> [   49.698698] PM: early resume of devices complete after 1089.068 msecs
>>>>> [   49.699601] r8169 0000:03:00.0: System wakeup disabled by ACPI
>>>>> [   49.699633] iwlwifi 0000:02:00.0: RF_KILL bit toggled to disable radio.
>>>>> [   49.708346] sd 0:0:0:0: [sda] Starting disk
>>>>> [   49.708581] rtc_cmos 00:04: System wakeup disabled by ACPI
>>>>> [   49.731492] r8169 0000:03:00.0 eth0: link down
>>>>> [   50.039718] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
>>>>> [   50.041544] ata1.00: configured for UDMA/133
>>>>> [   50.381833] PM: resume of devices complete after 683.151 msecs
>>>>> [   50.385539] PM: Finishing wakeup.
>>>>> [   50.385541] Restarting tasks ... done.
>>>>
>>>> Are the messages just informational or is this something serious that
>>>> should be fixed?
>>>
>>> Basically it means the machine has an interrupt line active an nothing
>>> is handling it. It is not serious but rather annoying.
>>
>> Thank you. Good to hear it is not critical. I would ignore the messages,
>> but it is annoying that almost 4MB of log messages are generated on
>> every resume after standby. Maybe it is possible to define a limit for
>> the maximum amount of the same messages?
> 
> The kernel will disable that line after 100000 spuriuos interrupts but I
> guess it is still rather annoying for the user.
> 
>>> I've seen this happening on proto hardware but it was fixed with
>>> firmware update.
>>
>> Currently I am using BIOS v1.22. Recently Acer released v1.23. I could
>> test if the BIOS update solves the problem.
> 
> Please try that first.

Updating the BIOS to v1.23 solved the problem. Thank you very much for
your advice!

> 
> If that does not work, then I we may be able to mask all interrupts up
> to nirqs - 1 but I first would like to check if the BIOS upgrade helps.
> 
>>>> If you need more information or if I should test a patch, please let me
>>>> know.
>>>
>>> Can you send me output of /proc/interrupts and
>>> /sys/kernel/debug/pinctrl/INT33FF:*/pins.
>>
>> Of course:
>>
>>> $ cat /proc/interrupts
>>>             CPU0       CPU1       CPU2       CPU3
>>>    0:       1388          0          0          0   IO-APIC    2-edge      timer
>>>    1:         17          3          4          3   IO-APIC    1-edge      i8042
>>>    8:          1          0          0          0   IO-APIC    8-edge      rtc0
>>>    9:         28         23         22         23   IO-APIC    9-fasteoi   acpi
>>>   12:         76         36         36         36   IO-APIC   12-edge      i8042
>>>   18:          0          0          0          0   IO-APIC   18-fasteoi   i801_smbus
>>>   45:         13         17          9         13   IO-APIC   45-fasteoi   mmc0
>>>   47:          0          0          0          0   IO-APIC   47-fasteoi   mmc1
>>>  115:       7816       7632       7694       7549   PCI-MSI 311296-edge      ahci[0000:00:13.0]
>>>  116:      43793      44661      43670      44402   PCI-MSI 1572864-edge      eth0
>>>  117:      10759          0          0          0  chv-gpio    0
>>>  191:          0          0          0          0  chv-gpio   50  80860F14:01 cd
>>>  196:         35         37         40         41   PCI-MSI 1048576-edge      iwlwifi
>>>  197:      24359      22517      23054      22852   PCI-MSI 327680-edge      xhci_hcd
>>>  198:     174515     168035     165519     167706   PCI-MSI 32768-edge      i915
>>>  199:         20         18         18         16   PCI-MSI 442368-edge      snd_hda_intel:card0
>>>  NMI:          0          0          0          0   Non-maskable interrupts
>>>  LOC:     409748     405619     425239     440549   Local timer interrupts
>>>  SPU:          0          0          0          0   Spurious interrupts
>>>  PMI:          0          0          0          0   Performance monitoring interrupts
>>>  IWI:          0          0          0          0   IRQ work interrupts
>>>  RTR:          0          0          0          0   APIC ICR read retries
>>>  RES:      74598      69435      68734      60007   Rescheduling interrupts
>>>  CAL:      23842      28885      29873      26859   Function call interrupts
>>>  TLB:      17407      23763      25060      21008   TLB shootdowns
>>>  TRM:          0          0          0          0   Thermal event interrupts
>>>  THR:          0          0          0          0   Threshold APIC interrupts
>>>  MCE:          0          0          0          0   Machine check exceptions
>>>  MCP:         45         44         44         44   Machine check polls
>>>  ERR:          0
>>>  MIS:          0
>>>  PIN:          0          0          0          0   Posted-interrupt notification event
>>>  PIW:          0          0          0          0   Posted-interrupt wakeup event

FYI: IRQ 117 is no longer listed in /proc/interrupts with BIOS v1.23.

>>
>>> $ cat /sys/kernel/debug/pinctrl/INT33FF:*/pin
>>> registered pins: 24
>>> pin 0 (PMU_SLP_S3_B) mode 1 ctrl0 0x00010001 ctrl1 0x04c00000
>>> pin 1 (PMU_BATLOW_B) mode 1 ctrl0 0x00010001 ctrl1 0x04c00000
>>> pin 2 (SUS_STAT_B) mode 1 ctrl0 0x00010001 ctrl1 0x04c00000
>>> pin 3 (PMU_SLP_S0IX_B) mode 1 ctrl0 0x00010001 ctrl1 0x04c00000
>>> pin 4 (PMU_AC_PRESENT) mode 1 ctrl0 0x00010001 ctrl1 0x04c00000
>>> pin 5 (PMU_PLTRST_B) mode 1 ctrl0 0x00010001 ctrl1 0x04c00000
>>> pin 6 (PMU_SUSCLK) mode 1 ctrl0 0x00010000 ctrl1 0x04c00000
>>> pin 7 (PMU_SLP_LAN_B) mode 1 ctrl0 0x00010000 ctrl1 0x04c00000
>>> pin 8 (PMU_PWRBTN_B) mode 1 ctrl0 0x00010001 ctrl1 0x04c00000
>>> pin 9 (PMU_SLP_S4_B) mode 1 ctrl0 0x00010001 ctrl1 0x04c00000
>>> pin 10 (PMU_WAKE_B) mode 1 ctrl0 0x00c10001 ctrl1 0x04c00000
>>> pin 11 (PMU_WAKE_LAN_B) GPIO ctrl0 0x00018300 ctrl1 0x04c00000
>>> pin 15 (MF_ISH_GPIO_3) mode 1 ctrl0 0x00010000 ctrl1 0x04c00000
>>> pin 16 (MF_ISH_GPIO_7) GPIO ctrl0 0x00018102 ctrl1 0x04c00000
>>> pin 17 (MF_ISH_I2C1_SCL) GPIO ctrl0 0x00018102 ctrl1 0x04c00000
>>> pin 18 (MF_ISH_GPIO_1) mode 1 ctrl0 0x00010000 ctrl1 0x04c00000
>>> pin 19 (MF_ISH_GPIO_5) GPIO ctrl0 0x10008201 ctrl1 0x04c00004
> 
> It most probably is this one. Active high and the line is high so this
> is generating interrupts all the time if not masked.

Looks you were right. Here is a diff of the changes (-: BIOS v1.22, +:
BIOS v1.23):

-pin 19 (MF_ISH_GPIO_5) GPIO ctrl0 0x10008201 ctrl1 0x04c00004
+pin 19 (MF_ISH_GPIO_5) GPIO ctrl0 0x10008200 ctrl1 0x04c00004
-pin 49 (MF_LPC_CLKOUT1) mode 1 ctrl0 0x00010001 ctrl1 0x04c00000
+pin 49 (MF_LPC_CLKOUT1) mode 1 ctrl0 0x00010000 ctrl1 0x04c00000
-pin 51 (MF_LPC_CLKOUT0) mode 1 ctrl0 0x00010000 ctrl1 0x04c00000
+pin 51 (MF_LPC_CLKOUT0) mode 1 ctrl0 0x00010001 ctrl1 0x04c00000

By the way, is the message
> Cannot set affinity for irq 191
in the above kernel logs something to care about?

Thanks,
Christian

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ