lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d22d9ed1-25a6-4b65-bda6-8266665a3f70@amd.com>
Date:   Thu, 19 Oct 2023 22:43:25 -0500
From:   Mario Limonciello <mario.limonciello@....com>
To:     Thomas Gleixner <tglx@...utronix.de>,
        Hans de Goede <hdegoede@...hat.com>, kys@...rosoft.com,
        hpa@...ux.intel.com, dlazar@...il.com
Cc:     x86@...nel.org, LKML <linux-kernel@...r.kernel.org>,
        Borislav Petkov <bp@...en8.de>
Subject: Re: PIC probing code from e179f6914152 failing

On 10/19/2023 16:20, Mario Limonciello wrote:
> On 10/18/2023 17:50, Thomas Gleixner wrote:
>> On Wed, Oct 18 2023 at 13:50, Mario Limonciello wrote:
>>> Recently an issue was reported to Bugzilla [1] that the Keyboard (IRQ 1)
>>> and GPIO controller (IRQ 7) weren't working properly on two separate
>>> Lenovo machines.  The issues are unique to Linux.
>>>
>>> In digging through them, they happen because Lenovo didn't set up the
>>> PIC in the BIOS.
>>> Specifically the PIC probing code introduced by e179f6914152 ("x86, irq,
>>> pic: Probe for legacy PIC and set legacy_pic appropriately") expects
>>> that the BIOS sets up the PIC and uses that assertion to let Linux set
>>> it up.
>>>
>>> One of the reporters confirmed that reverting e179f6914152 fixes the
>>> issue.  Keyboard and GPIO controller both work properly.
>>>
>>> I wanted to ask if we can please revert that and come up with a
>>> different solution for kexec with HyperV.
>>> Can guests instead perhaps detect in early boot code they're running in
>>> an EFI based hypervisor and explicitly set 'legacy_pic = 
>>> &null_legacy_pic;'?
>>
>> No. This detection mechanism prevents PIC usage also in other
>> scenarios.
>>
>> It's perfectly valid code and the assumption that you can read back what
>> you wrote to the master IMR is entirely correct. If that's not the case
>> then there is no PIC, the BIOS has disabled some parts of the legacy
>> block or did not initialize it.
>>
>> Letting the kernel blindly assume that there is always a PIC is just
>> wrong. Worse, it puts the burden on everyone else to sprinkle
>> "legacy_pic = null_pic;" all over the place with dubious
>> conditions. That's exactly what the commit in question avoided.
>>
>> So no, we are not going back there.
>>
>> We could obviously change the probe() function to issue a full PIC
>> initialization sequence before reading a known written value
>> back. That's basically what the revert causes to happen via
>> legacy_pic->init().
>>
>> But I fundamentally hate to do that because forcing the init sequence
>> just to make presumably broken code which has some dubious dependencies
>> on the PIC or on nr_legacy_irqs > 0 happy is not really a solution when
>> the PIC is actually not needed at all. For anything modern where all
>> legacy interrupts are routed to the IO/APIC the PIC does not make any
>> sense whatsoever.
>>
>> We rather go and fix the real underlying problem.
> 
> Looking at the logs from David and also trying to mock up the failure on 
> my side I have a few findings I'll share, please agree or disagree with 
> them.
> 
>>
>> The kernel can handle the legacy interrupts perfectly fine through the
>> IOAPIC. There is no hard requirement for the PIC at all except for
>> really old systems which had the timer interrupt wired to the PIC and
>> therefore required to route the timer interrupt through the PIC instead
>> of the IO/APIC or did not provide routing entries for the IO/APIC. See
>> the horrible hackery in check_timer() and the grossly misnomed
>> init_IO_APIC_traps().
>>
>> I just took a random machine, forced the NULL PIC and added
>> 'apic=verbose' to the kernel command line and magically all the legacy
>> interrupts are set up via IO/APIC despite the NULL PIC and therefore 0
>> preallocated legacy interrupts.
>>
>>    apic 0 pin 0 not connected
>>   IOAPIC[0]: Preconfigured routing entry (0-1 -> IRQ 1 Level:0 
>> ActiveLow:0)
>>   IOAPIC[0]: Preconfigured routing entry (0-2 -> IRQ 2 Level:0 
>> ActiveLow:0)
>>   IOAPIC[0]: Preconfigured routing entry (0-3 -> IRQ 3 Level:0 
>> ActiveLow:0)
>>   ...
>>
>> which is identical to the output with PIC enabled. That debug message is
>> emitted from mp_irqdomain_alloc() which is invoked via the PNP resource
>> management code.
>>
>> Now /proc/interrupts:
>>
>>             CPU0       CPU1       CPU2       CPU3
>>    1:          0         56          0          0    IO-APIC   
>> 1-edge      i8042
>>    4:        442          0          0          0    IO-APIC   
>> 4-edge      ttyS0
>>    8:          0          0          0          0    IO-APIC   
>> 8-edge      rtc0
>>    9:          0          0          0          0    IO-APIC   
>> 9-fasteoi   acpi
>>   12:          0          0        122          0    IO-APIC  
>> 12-edge      i8042
>>
>> Keyboard and serial are working, see?
>>
>> The only interrupt which does not work is interrupt 0 because nothing
>> allocates interrupt 0 due to nr_legacy_irqs == 0, but that's a trivially
>> solvable problem.
> 
>  From David's logs I can see that the timer interrupt gets wired up to 
> IRQ2 instead of IRQ0.
> 
> IOAPIC[0]: Preconfigured routing entry (33-2 -> IRQ 2 Level:0 ActiveLow:0)
> 
> In my hacked up forcing NULL pic case this fixes that:
> 
> diff --git a/arch/x86/kernel/i8259.c b/arch/x86/kernel/i8259.c
> index 43c1c24e934b..885687e64e4e 100644
> --- a/arch/x86/kernel/i8259.c
> +++ b/arch/x86/kernel/i8259.c
> @@ -424,7 +424,7 @@ static int legacy_pic_probe(void)
>   }
> 
>   struct legacy_pic null_legacy_pic = {
> -       .nr_legacy_irqs = 0,
> +       .nr_legacy_irqs = 1,
>          .chip = &dummy_irq_chip,
>          .mask = legacy_pic_uint_noop,
>          .unmask = legacy_pic_uint_noop,
> 
> I think it's cleaner than changing all the places that use 
> nr_legacy_irqs().  On my side this makes:
> 
> IOAPIC[0]: Preconfigured routing entry (33-2 -> IRQ 0 Level:0 ActiveLow:0)
> 
>>
>> That machine does not even need the timer interrupt because it has a
>> usable APIC and TSC deadline timer, so no APIC timer calibration
>> required. The same is true for CPUs which do not have a TSC deadline
>> timer, but enumerate the APIC frequency via CPUID or MSRs.
> 
> Don't you need it for things like rtcwake to be able to work?
> 
>>
>> But that brings up an interesting question. How are those affected
>> machines even reaching a state where the user notices that just the
>> keyboard and the GPIO are not working? Why?
> 
> So the GPIO controller driver (pinctrl-amd) uses platform_get_irq() to 
> try to discover the IRQ to use.
> 
> This calls acpi_irq_get() which isn't implemented on x86 (hardcodes 
> -EINVAL).
> 
> I can "work around it" by:
> 
> diff --git a/drivers/base/platform.c b/drivers/base/platform.c
> index 76bfcba25003..2b4b436c65d8 100644
> --- a/drivers/base/platform.c
> +++ b/drivers/base/platform.c
> @@ -187,7 +187,8 @@ int platform_get_irq_optional(struct platform_device 
> *dev, unsigned int num)
>          }
> 
>          r = platform_get_resource(dev, IORESOURCE_IRQ, num);
> -       if (has_acpi_companion(&dev->dev)) {
> +       if (IS_ENABLED(CONFIG_ACPI_GENERIC_GSI) &&
> +            has_acpi_companion(&dev->dev)) {
>                  if (r && r->flags & IORESOURCE_DISABLED) {
>                          ret = acpi_irq_get(ACPI_HANDLE(&dev->dev), num, 
> r);
>                          if (ret)
> 
> but the resource that is returned from the next hunk has the resource 
> flags set wrong in the NULL pic case:
> 
> NULL case:
> r: AMDI0030:00 flags: 0x30000418
> PIC case:
> r: AMDI0030:00 flags: 0x418
> 
> IOW NULL pic case has IORESOURCE_DISABLED / IORESOURCE_UNSET
> 
> This then later the GPIO controller interrupts are not actually working.
> For example the attn pin for my I2C touchpad doesn't work.
> 
> Checking the debugfs with my "work around" in place I can see a few 
> things set up differently:
> 
> NULL case:
> handler:  handle_edge_irq
> dstate:   0x3740c208
>              IRQ_TYPE_LEVEL_LOW
>              IRQD_ACTIVATED
>              IRQD_IRQ_STARTED
>              IRQD_SINGLE_TARGET
>              IRQD_MOVE_PCNTXT
>              IRQD_AFFINITY_ON_ACTIVATE
>              IRQD_CAN_RESERVE
>              IRQD_WAKEUP_STATE
>              IRQD_DEFAULT_TRIGGER_SET
>              IRQD_HANDLE_ENFORCE_IRQCTX
> 
> PIC case:
> handler:  handle_fasteoi_irq
> dstate:   0x3740e208
>              IRQ_TYPE_LEVEL_LOW
>              IRQD_LEVEL
>              IRQD_ACTIVATED
>              IRQD_IRQ_STARTED
>              IRQD_SINGLE_TARGET
>              IRQD_MOVE_PCNTXT
>              IRQD_AFFINITY_ON_ACTIVATE
>              IRQD_CAN_RESERVE
>              IRQD_WAKEUP_STATE
>              IRQD_DEFAULT_TRIGGER_SET
>              IRQD_HANDLE_ENFORCE_IRQCTX
> 
> I guess something related to the callpath for mp_register_handler().
> 
> Maybe this is the same reason for the keyboard not working right.
> 
>>
>> The CPUID/MSR APIC frequency enumeration is Intel specific and
>> everything else depends on a working timer interrupt to calibrate the
>> APIC timer frequency. So AMD CPUs require the timer interrupt to
>> work. The only explanation why this "works" in that null PIC case is
>> that the PIT/HPET interrupt is actually wired to pin 0, but that's
>> something to be determined...
>>
>> Can I please get the following information from an affected machine:
>>
>>    1) dmesg with 'apic=verbose' on the command line
>>    2) /proc/interrupts
>>    3) /sys/kernel/debug/irq/irqs/{0..15}
>>
>>    #3 requires CONFIG_GENERIC_IRQ_DEBUGFS to be set.
>>
>> Two versions of that please:
>>
>>    1) Unpatched kernel
>>    2) Patched kernel
>>
>> Thanks,
>>
>>          tglx
> 

At least on my system with forcing NULL PIC I've come up with this 
series to make everything work.

David, can you please have a try with it on your system?

https://lore.kernel.org/linux-kernel/20231020033806.88063-1-mario.limonciello@amd.com/#t

Thanks,

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ