[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <95ef6fd9-6d86-40e2-9814-d1f671b2262d@kernel.org>
Date: Mon, 19 Aug 2024 06:50:09 +0200
From: Jiri Slaby <jirislaby@...nel.org>
To: Petr Valenta <petr@...klidu.cz>, "Rafael J. Wysocki" <rafael@...nel.org>
Cc: Len Brown <lenb@...nel.org>,
"linux-acpi@...r.kernel.org" <linux-acpi@...r.kernel.org>,
Linux kernel mailing list <linux-kernel@...r.kernel.org>,
Linux regressions mailing list <regressions@...ts.linux.dev>,
Tony Nguyen <anthony.l.nguyen@...el.com>, przemyslaw.kitszel@...el.com,
intel-wired-lan@...ts.osuosl.org, jesse.brandeburg@...el.com
Subject: Re: ACPI IRQ storm with 6.10
CC e1000e guys + Jesse (due to 75a3f93b5383) + Bjorn (due to b2c289415b2b)
On 17. 08. 24, 19:57, Petr Valenta wrote:
>
>
> Dne 16. 08. 24 v 20:29 Rafael J. Wysocki napsal(a):
>> On Wed, Aug 14, 2024 at 8:48 AM Jiri Slaby <jirislaby@...nel.org> wrote:
>>>
>>> On 14. 08. 24, 7:22, Jiri Slaby wrote:
>>>> Hi,
>>>>
>>>> one openSUSE's user reported that with 6.10, he sees one CPU under an
>>>> IRQ storm from ACPI (sci_interrupt):
>>>> 9: 20220768 ... IR-IO-APIC 9-fasteoi acpi
>>>>
>>>> At:
>>>> https://bugzilla.suse.com/show_bug.cgi?id=1229085
>>>>
>>>> 6.9 was OK.
>>>>
>>>> With acpi.debug_level=0x08000000 acpi.debug_layer=0xffffffff, there
>>>> is a
>>>> repeated load of:
>>>>> evgpe-0673 ev_detect_gpe : Read registers for GPE 6D:
>>>>> Status=20, Enable=00, RunEnable=4A, WakeEnable=00
>>>
>>> 0x6d seems to count excessively (10 snapshots every 1 second):
>>>> /sys/firmware/acpi/interrupts/gpe6D: 82066 EN STS enabled
>>>> unmasked
>>>> /sys/firmware/acpi/interrupts/gpe6D: 86536 EN STS enabled
>>>> unmasked
>>>> /sys/firmware/acpi/interrupts/gpe6D: 90990 STS enabled
>>>> unmasked
>>>> /sys/firmware/acpi/interrupts/gpe6D: 95468 EN STS enabled
>>>> unmasked
>>>> /sys/firmware/acpi/interrupts/gpe6D: 100282 EN STS enabled
>>>> unmasked
>>>> /sys/firmware/acpi/interrupts/gpe6D: 105187 STS enabled
>>>> unmasked
>>>> /sys/firmware/acpi/interrupts/gpe6D: 110014 STS enabled
>>>> unmasked
>>>> /sys/firmware/acpi/interrupts/gpe6D: 114852 STS enabled
>>>> unmasked
>>>> /sys/firmware/acpi/interrupts/gpe6D: 119682 STS enabled
>>>> unmasked
>>>> /sys/firmware/acpi/interrupts/gpe6D: 124194 STS enabled
>>>> unmasked
>>>> /sys/firmware/acpi/interrupts/gpe6D: 128641 EN STS enabled
>>>> unmasked
>>>
>>> acpidump:
>>> https://bugzilla.suse.com/attachment.cgi?id=876677
>>>
>>> DSDT:
>>> https://bugzilla.suse.com/attachment.cgi?id=876678
>>>
>>>> Any ideas?
>>
>> GPE 6D is listed in _PRW for some devices, so maybe one of them
>> continues to trigger wakeup events?
>>
>
> Disabling powertop service (which calls /usr/sbin/powertop --auto-tune)
> solves problem completely. After some search I have found this is the
> cause:
>
> # causes IRQ storm on 6.10.x
> # kernel 6.9.9 is immune
> echo 'auto' > /sys/bus/pci/devices/0000:00:1f.6/power/control
$ git log --no-merges --oneline v6.9..v6.10
drivers/net/ethernet/intel/e1000e/
76a0a3f9cc2f e1000e: fix force smbus during suspend flow
c93a6f62cb1b e1000e: Fix S0ix residency on corporate systems
bfd546a552e1 e1000e: move force SMBUS near the end of enable_ulp function
6918107e2540 net: e1000e & ixgbe: Remove PCI_HEADER_TYPE_MFD duplicates
1eb2cded45b3 net: annotate writes on dev->mtu from ndo_change_mtu()
b2c289415b2b e1000e: Remove redundant runtime resume for ethtool_ops
75a3f93b5383 net: intel: implement modern PM ops declarations
The last two play with PM ^^. I cannot immediately see if the issue can
be caused by any of those, though.
If there are no ideas, possibly giving revert of both a try?
> lspci | grep 1f.6
> 00:1f.6 Ethernet controller: Intel Corporation Device 550b (rev 20)
>
> journalctl -b | grep 1f.6
> srp 17 19:44:17 e14 kernel: pci 0000:00:1f.6: [8086:550b] type 00 class
> 0x020000 conventional PCI endpoint
> srp 17 19:44:17 e14 kernel: pci 0000:00:1f.6: BAR 0 [mem
> 0x9c300000-0x9c31ffff]
> srp 17 19:44:17 e14 kernel: pci 0000:00:1f.6: PME# supported from D0
> D3hot D3cold
> srp 17 19:44:17 e14 kernel: pci 0000:00:1f.6: Adding to iommu group 12
> srp 17 19:44:19 e14 kernel: e1000e 0000:00:1f.6: Interrupt Throttling
> Rate (ints/sec) set to dynamic conservative mode
> srp 17 19:44:19 e14 kernel: e1000e 0000:00:1f.6 0000:00:1f.6
> (uninitialized): registered PHC clock
> srp 17 19:44:20 e14 kernel: e1000e 0000:00:1f.6 eth0: (PCI
> Express:2.5GT/s:Width x1) fc:5c:ee:b0:13:74
> srp 17 19:44:20 e14 kernel: e1000e 0000:00:1f.6 eth0: Intel(R) PRO/1000
> Network Connection
> srp 17 19:44:20 e14 kernel: e1000e 0000:00:1f.6 eth0: MAC: 16, PHY: 12,
> PBA No: FFFFFF-0FF
> srp 17 19:44:20 e14 kernel: e1000e 0000:00:1f.6 enp0s31f6: renamed from
> eth0
> srp 17 19:44:24 e14 ModemManager[1434]: <info> [base-manager] couldn't
> check support for device '/sys/devices/pci0000:00/0000:00:1f.6': not
> supported by any plugin
>
>
>
>> You can ask the reporter to mask that GPE via "echo mask >
>> /sys/firmware/acpi/interrupts/gpe6D" and see if the storm goes away
>> then.
>>
>> The only ACPI core issue introduced between 6.9 and 6.10 I'm aware of
>> is the one addressed by this series
>>
>> https://lore.kernel.org/linux-acpi/22385894.EfDdHjke4D@rjwysocki.net/
>>
>> but this is about the EC and the problem here doesn't appear to be
>> EC-related. It may be worth trying anyway, though.
>>
--
js
suse labs
Powered by blists - more mailing lists