[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d22d9ed1-25a6-4b65-bda6-8266665a3f70@amd.com>
Date: Thu, 19 Oct 2023 22:43:25 -0500
From: Mario Limonciello <mario.limonciello@....com>
To: Thomas Gleixner <tglx@...utronix.de>,
Hans de Goede <hdegoede@...hat.com>, kys@...rosoft.com,
hpa@...ux.intel.com, dlazar@...il.com
Cc: x86@...nel.org, LKML <linux-kernel@...r.kernel.org>,
Borislav Petkov <bp@...en8.de>
Subject: Re: PIC probing code from e179f6914152 failing
On 10/19/2023 16:20, Mario Limonciello wrote:
> On 10/18/2023 17:50, Thomas Gleixner wrote:
>> On Wed, Oct 18 2023 at 13:50, Mario Limonciello wrote:
>>> Recently an issue was reported to Bugzilla [1] that the Keyboard (IRQ 1)
>>> and GPIO controller (IRQ 7) weren't working properly on two separate
>>> Lenovo machines. The issues are unique to Linux.
>>>
>>> In digging through them, they happen because Lenovo didn't set up the
>>> PIC in the BIOS.
>>> Specifically the PIC probing code introduced by e179f6914152 ("x86, irq,
>>> pic: Probe for legacy PIC and set legacy_pic appropriately") expects
>>> that the BIOS sets up the PIC and uses that assertion to let Linux set
>>> it up.
>>>
>>> One of the reporters confirmed that reverting e179f6914152 fixes the
>>> issue. Keyboard and GPIO controller both work properly.
>>>
>>> I wanted to ask if we can please revert that and come up with a
>>> different solution for kexec with HyperV.
>>> Can guests instead perhaps detect in early boot code they're running in
>>> an EFI based hypervisor and explicitly set 'legacy_pic =
>>> &null_legacy_pic;'?
>>
>> No. This detection mechanism prevents PIC usage also in other
>> scenarios.
>>
>> It's perfectly valid code and the assumption that you can read back what
>> you wrote to the master IMR is entirely correct. If that's not the case
>> then there is no PIC, the BIOS has disabled some parts of the legacy
>> block or did not initialize it.
>>
>> Letting the kernel blindly assume that there is always a PIC is just
>> wrong. Worse, it puts the burden on everyone else to sprinkle
>> "legacy_pic = null_pic;" all over the place with dubious
>> conditions. That's exactly what the commit in question avoided.
>>
>> So no, we are not going back there.
>>
>> We could obviously change the probe() function to issue a full PIC
>> initialization sequence before reading a known written value
>> back. That's basically what the revert causes to happen via
>> legacy_pic->init().
>>
>> But I fundamentally hate to do that because forcing the init sequence
>> just to make presumably broken code which has some dubious dependencies
>> on the PIC or on nr_legacy_irqs > 0 happy is not really a solution when
>> the PIC is actually not needed at all. For anything modern where all
>> legacy interrupts are routed to the IO/APIC the PIC does not make any
>> sense whatsoever.
>>
>> We rather go and fix the real underlying problem.
>
> Looking at the logs from David and also trying to mock up the failure on
> my side I have a few findings I'll share, please agree or disagree with
> them.
>
>>
>> The kernel can handle the legacy interrupts perfectly fine through the
>> IOAPIC. There is no hard requirement for the PIC at all except for
>> really old systems which had the timer interrupt wired to the PIC and
>> therefore required to route the timer interrupt through the PIC instead
>> of the IO/APIC or did not provide routing entries for the IO/APIC. See
>> the horrible hackery in check_timer() and the grossly misnomed
>> init_IO_APIC_traps().
>>
>> I just took a random machine, forced the NULL PIC and added
>> 'apic=verbose' to the kernel command line and magically all the legacy
>> interrupts are set up via IO/APIC despite the NULL PIC and therefore 0
>> preallocated legacy interrupts.
>>
>> apic 0 pin 0 not connected
>> IOAPIC[0]: Preconfigured routing entry (0-1 -> IRQ 1 Level:0
>> ActiveLow:0)
>> IOAPIC[0]: Preconfigured routing entry (0-2 -> IRQ 2 Level:0
>> ActiveLow:0)
>> IOAPIC[0]: Preconfigured routing entry (0-3 -> IRQ 3 Level:0
>> ActiveLow:0)
>> ...
>>
>> which is identical to the output with PIC enabled. That debug message is
>> emitted from mp_irqdomain_alloc() which is invoked via the PNP resource
>> management code.
>>
>> Now /proc/interrupts:
>>
>> CPU0 CPU1 CPU2 CPU3
>> 1: 0 56 0 0 IO-APIC
>> 1-edge i8042
>> 4: 442 0 0 0 IO-APIC
>> 4-edge ttyS0
>> 8: 0 0 0 0 IO-APIC
>> 8-edge rtc0
>> 9: 0 0 0 0 IO-APIC
>> 9-fasteoi acpi
>> 12: 0 0 122 0 IO-APIC
>> 12-edge i8042
>>
>> Keyboard and serial are working, see?
>>
>> The only interrupt which does not work is interrupt 0 because nothing
>> allocates interrupt 0 due to nr_legacy_irqs == 0, but that's a trivially
>> solvable problem.
>
> From David's logs I can see that the timer interrupt gets wired up to
> IRQ2 instead of IRQ0.
>
> IOAPIC[0]: Preconfigured routing entry (33-2 -> IRQ 2 Level:0 ActiveLow:0)
>
> In my hacked up forcing NULL pic case this fixes that:
>
> diff --git a/arch/x86/kernel/i8259.c b/arch/x86/kernel/i8259.c
> index 43c1c24e934b..885687e64e4e 100644
> --- a/arch/x86/kernel/i8259.c
> +++ b/arch/x86/kernel/i8259.c
> @@ -424,7 +424,7 @@ static int legacy_pic_probe(void)
> }
>
> struct legacy_pic null_legacy_pic = {
> - .nr_legacy_irqs = 0,
> + .nr_legacy_irqs = 1,
> .chip = &dummy_irq_chip,
> .mask = legacy_pic_uint_noop,
> .unmask = legacy_pic_uint_noop,
>
> I think it's cleaner than changing all the places that use
> nr_legacy_irqs(). On my side this makes:
>
> IOAPIC[0]: Preconfigured routing entry (33-2 -> IRQ 0 Level:0 ActiveLow:0)
>
>>
>> That machine does not even need the timer interrupt because it has a
>> usable APIC and TSC deadline timer, so no APIC timer calibration
>> required. The same is true for CPUs which do not have a TSC deadline
>> timer, but enumerate the APIC frequency via CPUID or MSRs.
>
> Don't you need it for things like rtcwake to be able to work?
>
>>
>> But that brings up an interesting question. How are those affected
>> machines even reaching a state where the user notices that just the
>> keyboard and the GPIO are not working? Why?
>
> So the GPIO controller driver (pinctrl-amd) uses platform_get_irq() to
> try to discover the IRQ to use.
>
> This calls acpi_irq_get() which isn't implemented on x86 (hardcodes
> -EINVAL).
>
> I can "work around it" by:
>
> diff --git a/drivers/base/platform.c b/drivers/base/platform.c
> index 76bfcba25003..2b4b436c65d8 100644
> --- a/drivers/base/platform.c
> +++ b/drivers/base/platform.c
> @@ -187,7 +187,8 @@ int platform_get_irq_optional(struct platform_device
> *dev, unsigned int num)
> }
>
> r = platform_get_resource(dev, IORESOURCE_IRQ, num);
> - if (has_acpi_companion(&dev->dev)) {
> + if (IS_ENABLED(CONFIG_ACPI_GENERIC_GSI) &&
> + has_acpi_companion(&dev->dev)) {
> if (r && r->flags & IORESOURCE_DISABLED) {
> ret = acpi_irq_get(ACPI_HANDLE(&dev->dev), num,
> r);
> if (ret)
>
> but the resource that is returned from the next hunk has the resource
> flags set wrong in the NULL pic case:
>
> NULL case:
> r: AMDI0030:00 flags: 0x30000418
> PIC case:
> r: AMDI0030:00 flags: 0x418
>
> IOW NULL pic case has IORESOURCE_DISABLED / IORESOURCE_UNSET
>
> This then later the GPIO controller interrupts are not actually working.
> For example the attn pin for my I2C touchpad doesn't work.
>
> Checking the debugfs with my "work around" in place I can see a few
> things set up differently:
>
> NULL case:
> handler: handle_edge_irq
> dstate: 0x3740c208
> IRQ_TYPE_LEVEL_LOW
> IRQD_ACTIVATED
> IRQD_IRQ_STARTED
> IRQD_SINGLE_TARGET
> IRQD_MOVE_PCNTXT
> IRQD_AFFINITY_ON_ACTIVATE
> IRQD_CAN_RESERVE
> IRQD_WAKEUP_STATE
> IRQD_DEFAULT_TRIGGER_SET
> IRQD_HANDLE_ENFORCE_IRQCTX
>
> PIC case:
> handler: handle_fasteoi_irq
> dstate: 0x3740e208
> IRQ_TYPE_LEVEL_LOW
> IRQD_LEVEL
> IRQD_ACTIVATED
> IRQD_IRQ_STARTED
> IRQD_SINGLE_TARGET
> IRQD_MOVE_PCNTXT
> IRQD_AFFINITY_ON_ACTIVATE
> IRQD_CAN_RESERVE
> IRQD_WAKEUP_STATE
> IRQD_DEFAULT_TRIGGER_SET
> IRQD_HANDLE_ENFORCE_IRQCTX
>
> I guess something related to the callpath for mp_register_handler().
>
> Maybe this is the same reason for the keyboard not working right.
>
>>
>> The CPUID/MSR APIC frequency enumeration is Intel specific and
>> everything else depends on a working timer interrupt to calibrate the
>> APIC timer frequency. So AMD CPUs require the timer interrupt to
>> work. The only explanation why this "works" in that null PIC case is
>> that the PIT/HPET interrupt is actually wired to pin 0, but that's
>> something to be determined...
>>
>> Can I please get the following information from an affected machine:
>>
>> 1) dmesg with 'apic=verbose' on the command line
>> 2) /proc/interrupts
>> 3) /sys/kernel/debug/irq/irqs/{0..15}
>>
>> #3 requires CONFIG_GENERIC_IRQ_DEBUGFS to be set.
>>
>> Two versions of that please:
>>
>> 1) Unpatched kernel
>> 2) Patched kernel
>>
>> Thanks,
>>
>> tglx
>
At least on my system with forcing NULL PIC I've come up with this
series to make everything work.
David, can you please have a try with it on your system?
https://lore.kernel.org/linux-kernel/20231020033806.88063-1-mario.limonciello@amd.com/#t
Thanks,
Powered by blists - more mailing lists