[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e79dea49-0c07-4ca2-b359-97dd1bc579c8@amd.com>
Date: Thu, 19 Oct 2023 16:20:17 -0500
From: Mario Limonciello <mario.limonciello@....com>
To: Thomas Gleixner <tglx@...utronix.de>,
Hans de Goede <hdegoede@...hat.com>, kys@...rosoft.com,
hpa@...ux.intel.com
Cc: x86@...nel.org, LKML <linux-kernel@...r.kernel.org>,
Borislav Petkov <bp@...en8.de>
Subject: Re: PIC probing code from e179f6914152 failing
On 10/18/2023 17:50, Thomas Gleixner wrote:
> On Wed, Oct 18 2023 at 13:50, Mario Limonciello wrote:
>> Recently an issue was reported to Bugzilla [1] that the Keyboard (IRQ 1)
>> and GPIO controller (IRQ 7) weren't working properly on two separate
>> Lenovo machines. The issues are unique to Linux.
>>
>> In digging through them, they happen because Lenovo didn't set up the
>> PIC in the BIOS.
>> Specifically the PIC probing code introduced by e179f6914152 ("x86, irq,
>> pic: Probe for legacy PIC and set legacy_pic appropriately") expects
>> that the BIOS sets up the PIC and uses that assertion to let Linux set
>> it up.
>>
>> One of the reporters confirmed that reverting e179f6914152 fixes the
>> issue. Keyboard and GPIO controller both work properly.
>>
>> I wanted to ask if we can please revert that and come up with a
>> different solution for kexec with HyperV.
>> Can guests instead perhaps detect in early boot code they're running in
>> an EFI based hypervisor and explicitly set 'legacy_pic = &null_legacy_pic;'?
>
> No. This detection mechanism prevents PIC usage also in other
> scenarios.
>
> It's perfectly valid code and the assumption that you can read back what
> you wrote to the master IMR is entirely correct. If that's not the case
> then there is no PIC, the BIOS has disabled some parts of the legacy
> block or did not initialize it.
>
> Letting the kernel blindly assume that there is always a PIC is just
> wrong. Worse, it puts the burden on everyone else to sprinkle
> "legacy_pic = null_pic;" all over the place with dubious
> conditions. That's exactly what the commit in question avoided.
>
> So no, we are not going back there.
>
> We could obviously change the probe() function to issue a full PIC
> initialization sequence before reading a known written value
> back. That's basically what the revert causes to happen via
> legacy_pic->init().
>
> But I fundamentally hate to do that because forcing the init sequence
> just to make presumably broken code which has some dubious dependencies
> on the PIC or on nr_legacy_irqs > 0 happy is not really a solution when
> the PIC is actually not needed at all. For anything modern where all
> legacy interrupts are routed to the IO/APIC the PIC does not make any
> sense whatsoever.
>
> We rather go and fix the real underlying problem.
Looking at the logs from David and also trying to mock up the failure on
my side I have a few findings I'll share, please agree or disagree with
them.
>
> The kernel can handle the legacy interrupts perfectly fine through the
> IOAPIC. There is no hard requirement for the PIC at all except for
> really old systems which had the timer interrupt wired to the PIC and
> therefore required to route the timer interrupt through the PIC instead
> of the IO/APIC or did not provide routing entries for the IO/APIC. See
> the horrible hackery in check_timer() and the grossly misnomed
> init_IO_APIC_traps().
>
> I just took a random machine, forced the NULL PIC and added
> 'apic=verbose' to the kernel command line and magically all the legacy
> interrupts are set up via IO/APIC despite the NULL PIC and therefore 0
> preallocated legacy interrupts.
>
> apic 0 pin 0 not connected
> IOAPIC[0]: Preconfigured routing entry (0-1 -> IRQ 1 Level:0 ActiveLow:0)
> IOAPIC[0]: Preconfigured routing entry (0-2 -> IRQ 2 Level:0 ActiveLow:0)
> IOAPIC[0]: Preconfigured routing entry (0-3 -> IRQ 3 Level:0 ActiveLow:0)
> ...
>
> which is identical to the output with PIC enabled. That debug message is
> emitted from mp_irqdomain_alloc() which is invoked via the PNP resource
> management code.
>
> Now /proc/interrupts:
>
> CPU0 CPU1 CPU2 CPU3
> 1: 0 56 0 0 IO-APIC 1-edge i8042
> 4: 442 0 0 0 IO-APIC 4-edge ttyS0
> 8: 0 0 0 0 IO-APIC 8-edge rtc0
> 9: 0 0 0 0 IO-APIC 9-fasteoi acpi
> 12: 0 0 122 0 IO-APIC 12-edge i8042
>
> Keyboard and serial are working, see?
>
> The only interrupt which does not work is interrupt 0 because nothing
> allocates interrupt 0 due to nr_legacy_irqs == 0, but that's a trivially
> solvable problem.
From David's logs I can see that the timer interrupt gets wired up to
IRQ2 instead of IRQ0.
IOAPIC[0]: Preconfigured routing entry (33-2 -> IRQ 2 Level:0 ActiveLow:0)
In my hacked up forcing NULL pic case this fixes that:
diff --git a/arch/x86/kernel/i8259.c b/arch/x86/kernel/i8259.c
index 43c1c24e934b..885687e64e4e 100644
--- a/arch/x86/kernel/i8259.c
+++ b/arch/x86/kernel/i8259.c
@@ -424,7 +424,7 @@ static int legacy_pic_probe(void)
}
struct legacy_pic null_legacy_pic = {
- .nr_legacy_irqs = 0,
+ .nr_legacy_irqs = 1,
.chip = &dummy_irq_chip,
.mask = legacy_pic_uint_noop,
.unmask = legacy_pic_uint_noop,
I think it's cleaner than changing all the places that use
nr_legacy_irqs(). On my side this makes:
IOAPIC[0]: Preconfigured routing entry (33-2 -> IRQ 0 Level:0 ActiveLow:0)
>
> That machine does not even need the timer interrupt because it has a
> usable APIC and TSC deadline timer, so no APIC timer calibration
> required. The same is true for CPUs which do not have a TSC deadline
> timer, but enumerate the APIC frequency via CPUID or MSRs.
Don't you need it for things like rtcwake to be able to work?
>
> But that brings up an interesting question. How are those affected
> machines even reaching a state where the user notices that just the
> keyboard and the GPIO are not working? Why?
So the GPIO controller driver (pinctrl-amd) uses platform_get_irq() to
try to discover the IRQ to use.
This calls acpi_irq_get() which isn't implemented on x86 (hardcodes
-EINVAL).
I can "work around it" by:
diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index 76bfcba25003..2b4b436c65d8 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -187,7 +187,8 @@ int platform_get_irq_optional(struct platform_device
*dev, unsigned int num)
}
r = platform_get_resource(dev, IORESOURCE_IRQ, num);
- if (has_acpi_companion(&dev->dev)) {
+ if (IS_ENABLED(CONFIG_ACPI_GENERIC_GSI) &&
+ has_acpi_companion(&dev->dev)) {
if (r && r->flags & IORESOURCE_DISABLED) {
ret = acpi_irq_get(ACPI_HANDLE(&dev->dev), num, r);
if (ret)
but the resource that is returned from the next hunk has the resource
flags set wrong in the NULL pic case:
NULL case:
r: AMDI0030:00 flags: 0x30000418
PIC case:
r: AMDI0030:00 flags: 0x418
IOW NULL pic case has IORESOURCE_DISABLED / IORESOURCE_UNSET
This then later the GPIO controller interrupts are not actually working.
For example the attn pin for my I2C touchpad doesn't work.
Checking the debugfs with my "work around" in place I can see a few
things set up differently:
NULL case:
handler: handle_edge_irq
dstate: 0x3740c208
IRQ_TYPE_LEVEL_LOW
IRQD_ACTIVATED
IRQD_IRQ_STARTED
IRQD_SINGLE_TARGET
IRQD_MOVE_PCNTXT
IRQD_AFFINITY_ON_ACTIVATE
IRQD_CAN_RESERVE
IRQD_WAKEUP_STATE
IRQD_DEFAULT_TRIGGER_SET
IRQD_HANDLE_ENFORCE_IRQCTX
PIC case:
handler: handle_fasteoi_irq
dstate: 0x3740e208
IRQ_TYPE_LEVEL_LOW
IRQD_LEVEL
IRQD_ACTIVATED
IRQD_IRQ_STARTED
IRQD_SINGLE_TARGET
IRQD_MOVE_PCNTXT
IRQD_AFFINITY_ON_ACTIVATE
IRQD_CAN_RESERVE
IRQD_WAKEUP_STATE
IRQD_DEFAULT_TRIGGER_SET
IRQD_HANDLE_ENFORCE_IRQCTX
I guess something related to the callpath for mp_register_handler().
Maybe this is the same reason for the keyboard not working right.
>
> The CPUID/MSR APIC frequency enumeration is Intel specific and
> everything else depends on a working timer interrupt to calibrate the
> APIC timer frequency. So AMD CPUs require the timer interrupt to
> work. The only explanation why this "works" in that null PIC case is
> that the PIT/HPET interrupt is actually wired to pin 0, but that's
> something to be determined...
>
> Can I please get the following information from an affected machine:
>
> 1) dmesg with 'apic=verbose' on the command line
> 2) /proc/interrupts
> 3) /sys/kernel/debug/irq/irqs/{0..15}
>
> #3 requires CONFIG_GENERIC_IRQ_DEBUGFS to be set.
>
> Two versions of that please:
>
> 1) Unpatched kernel
> 2) Patched kernel
>
> Thanks,
>
> tglx
Powered by blists - more mailing lists