lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e79dea49-0c07-4ca2-b359-97dd1bc579c8@amd.com>
Date:   Thu, 19 Oct 2023 16:20:17 -0500
From:   Mario Limonciello <mario.limonciello@....com>
To:     Thomas Gleixner <tglx@...utronix.de>,
        Hans de Goede <hdegoede@...hat.com>, kys@...rosoft.com,
        hpa@...ux.intel.com
Cc:     x86@...nel.org, LKML <linux-kernel@...r.kernel.org>,
        Borislav Petkov <bp@...en8.de>
Subject: Re: PIC probing code from e179f6914152 failing

On 10/18/2023 17:50, Thomas Gleixner wrote:
> On Wed, Oct 18 2023 at 13:50, Mario Limonciello wrote:
>> Recently an issue was reported to Bugzilla [1] that the Keyboard (IRQ 1)
>> and GPIO controller (IRQ 7) weren't working properly on two separate
>> Lenovo machines.  The issues are unique to Linux.
>>
>> In digging through them, they happen because Lenovo didn't set up the
>> PIC in the BIOS.
>> Specifically the PIC probing code introduced by e179f6914152 ("x86, irq,
>> pic: Probe for legacy PIC and set legacy_pic appropriately") expects
>> that the BIOS sets up the PIC and uses that assertion to let Linux set
>> it up.
>>
>> One of the reporters confirmed that reverting e179f6914152 fixes the
>> issue.  Keyboard and GPIO controller both work properly.
>>
>> I wanted to ask if we can please revert that and come up with a
>> different solution for kexec with HyperV.
>> Can guests instead perhaps detect in early boot code they're running in
>> an EFI based hypervisor and explicitly set 'legacy_pic = &null_legacy_pic;'?
> 
> No. This detection mechanism prevents PIC usage also in other
> scenarios.
> 
> It's perfectly valid code and the assumption that you can read back what
> you wrote to the master IMR is entirely correct. If that's not the case
> then there is no PIC, the BIOS has disabled some parts of the legacy
> block or did not initialize it.
> 
> Letting the kernel blindly assume that there is always a PIC is just
> wrong. Worse, it puts the burden on everyone else to sprinkle
> "legacy_pic = null_pic;" all over the place with dubious
> conditions. That's exactly what the commit in question avoided.
> 
> So no, we are not going back there.
> 
> We could obviously change the probe() function to issue a full PIC
> initialization sequence before reading a known written value
> back. That's basically what the revert causes to happen via
> legacy_pic->init().
> 
> But I fundamentally hate to do that because forcing the init sequence
> just to make presumably broken code which has some dubious dependencies
> on the PIC or on nr_legacy_irqs > 0 happy is not really a solution when
> the PIC is actually not needed at all. For anything modern where all
> legacy interrupts are routed to the IO/APIC the PIC does not make any
> sense whatsoever.
> 
> We rather go and fix the real underlying problem.

Looking at the logs from David and also trying to mock up the failure on 
my side I have a few findings I'll share, please agree or disagree with 
them.

> 
> The kernel can handle the legacy interrupts perfectly fine through the
> IOAPIC. There is no hard requirement for the PIC at all except for
> really old systems which had the timer interrupt wired to the PIC and
> therefore required to route the timer interrupt through the PIC instead
> of the IO/APIC or did not provide routing entries for the IO/APIC. See
> the horrible hackery in check_timer() and the grossly misnomed
> init_IO_APIC_traps().
> 
> I just took a random machine, forced the NULL PIC and added
> 'apic=verbose' to the kernel command line and magically all the legacy
> interrupts are set up via IO/APIC despite the NULL PIC and therefore 0
> preallocated legacy interrupts.
> 
>    apic 0 pin 0 not connected
>   IOAPIC[0]: Preconfigured routing entry (0-1 -> IRQ 1 Level:0 ActiveLow:0)
>   IOAPIC[0]: Preconfigured routing entry (0-2 -> IRQ 2 Level:0 ActiveLow:0)
>   IOAPIC[0]: Preconfigured routing entry (0-3 -> IRQ 3 Level:0 ActiveLow:0)
>   ...
> 
> which is identical to the output with PIC enabled. That debug message is
> emitted from mp_irqdomain_alloc() which is invoked via the PNP resource
> management code.
> 
> Now /proc/interrupts:
> 
>             CPU0       CPU1       CPU2       CPU3
>    1:          0         56          0          0    IO-APIC   1-edge      i8042
>    4:        442          0          0          0    IO-APIC   4-edge      ttyS0
>    8:          0          0          0          0    IO-APIC   8-edge      rtc0
>    9:          0          0          0          0    IO-APIC   9-fasteoi   acpi
>   12:          0          0        122          0    IO-APIC  12-edge      i8042
> 
> Keyboard and serial are working, see?
> 
> The only interrupt which does not work is interrupt 0 because nothing
> allocates interrupt 0 due to nr_legacy_irqs == 0, but that's a trivially
> solvable problem.

 From David's logs I can see that the timer interrupt gets wired up to 
IRQ2 instead of IRQ0.

IOAPIC[0]: Preconfigured routing entry (33-2 -> IRQ 2 Level:0 ActiveLow:0)

In my hacked up forcing NULL pic case this fixes that:

diff --git a/arch/x86/kernel/i8259.c b/arch/x86/kernel/i8259.c
index 43c1c24e934b..885687e64e4e 100644
--- a/arch/x86/kernel/i8259.c
+++ b/arch/x86/kernel/i8259.c
@@ -424,7 +424,7 @@ static int legacy_pic_probe(void)
  }

  struct legacy_pic null_legacy_pic = {
-       .nr_legacy_irqs = 0,
+       .nr_legacy_irqs = 1,
         .chip = &dummy_irq_chip,
         .mask = legacy_pic_uint_noop,
         .unmask = legacy_pic_uint_noop,

I think it's cleaner than changing all the places that use 
nr_legacy_irqs().  On my side this makes:

IOAPIC[0]: Preconfigured routing entry (33-2 -> IRQ 0 Level:0 ActiveLow:0)

> 
> That machine does not even need the timer interrupt because it has a
> usable APIC and TSC deadline timer, so no APIC timer calibration
> required. The same is true for CPUs which do not have a TSC deadline
> timer, but enumerate the APIC frequency via CPUID or MSRs.

Don't you need it for things like rtcwake to be able to work?

> 
> But that brings up an interesting question. How are those affected
> machines even reaching a state where the user notices that just the
> keyboard and the GPIO are not working? Why?

So the GPIO controller driver (pinctrl-amd) uses platform_get_irq() to 
try to discover the IRQ to use.

This calls acpi_irq_get() which isn't implemented on x86 (hardcodes 
-EINVAL).

I can "work around it" by:

diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index 76bfcba25003..2b4b436c65d8 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -187,7 +187,8 @@ int platform_get_irq_optional(struct platform_device 
*dev, unsigned int num)
         }

         r = platform_get_resource(dev, IORESOURCE_IRQ, num);
-       if (has_acpi_companion(&dev->dev)) {
+       if (IS_ENABLED(CONFIG_ACPI_GENERIC_GSI) &&
+            has_acpi_companion(&dev->dev)) {
                 if (r && r->flags & IORESOURCE_DISABLED) {
                         ret = acpi_irq_get(ACPI_HANDLE(&dev->dev), num, r);
                         if (ret)

but the resource that is returned from the next hunk has the resource 
flags set wrong in the NULL pic case:

NULL case:
r: AMDI0030:00 flags: 0x30000418
PIC case:
r: AMDI0030:00 flags: 0x418

IOW NULL pic case has IORESOURCE_DISABLED / IORESOURCE_UNSET

This then later the GPIO controller interrupts are not actually working.
For example the attn pin for my I2C touchpad doesn't work.

Checking the debugfs with my "work around" in place I can see a few 
things set up differently:

NULL case:
handler:  handle_edge_irq
dstate:   0x3740c208
             IRQ_TYPE_LEVEL_LOW
             IRQD_ACTIVATED
             IRQD_IRQ_STARTED
             IRQD_SINGLE_TARGET
             IRQD_MOVE_PCNTXT
             IRQD_AFFINITY_ON_ACTIVATE
             IRQD_CAN_RESERVE
             IRQD_WAKEUP_STATE
             IRQD_DEFAULT_TRIGGER_SET
             IRQD_HANDLE_ENFORCE_IRQCTX

PIC case:
handler:  handle_fasteoi_irq
dstate:   0x3740e208
             IRQ_TYPE_LEVEL_LOW
             IRQD_LEVEL
             IRQD_ACTIVATED
             IRQD_IRQ_STARTED
             IRQD_SINGLE_TARGET
             IRQD_MOVE_PCNTXT
             IRQD_AFFINITY_ON_ACTIVATE
             IRQD_CAN_RESERVE
             IRQD_WAKEUP_STATE
             IRQD_DEFAULT_TRIGGER_SET
             IRQD_HANDLE_ENFORCE_IRQCTX

I guess something related to the callpath for mp_register_handler().

Maybe this is the same reason for the keyboard not working right.

> 
> The CPUID/MSR APIC frequency enumeration is Intel specific and
> everything else depends on a working timer interrupt to calibrate the
> APIC timer frequency. So AMD CPUs require the timer interrupt to
> work. The only explanation why this "works" in that null PIC case is
> that the PIT/HPET interrupt is actually wired to pin 0, but that's
> something to be determined...
> 
> Can I please get the following information from an affected machine:
> 
>    1) dmesg with 'apic=verbose' on the command line
>    2) /proc/interrupts
>    3) /sys/kernel/debug/irq/irqs/{0..15}
> 
>    #3 requires CONFIG_GENERIC_IRQ_DEBUGFS to be set.
> 
> Two versions of that please:
> 
>    1) Unpatched kernel
>    2) Patched kernel
> 
> Thanks,
> 
>          tglx

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ