lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87fs9u6twk.ffs@tglx>
Date:   Fri, 24 Mar 2023 16:39:07 +0100
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Saurabh Singh Sengar <ssengar@...rosoft.com>,
        Borislav Petkov <bp@...en8.de>
Cc:     Saurabh Sengar <ssengar@...ux.microsoft.com>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
        "x86@...nel.org" <x86@...nel.org>, "hpa@...or.com" <hpa@...or.com>,
        "johan+linaro@...nel.org" <johan+linaro@...nel.org>,
        "isaku.yamahata@...el.com" <isaku.yamahata@...el.com>,
        "Michael Kelley (LINUX)" <mikelley@...rosoft.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "rahul.tanwar@...ux.intel.com" <rahul.tanwar@...ux.intel.com>,
        "andriy.shevchenko@...el.com" <andriy.shevchenko@...el.com>
Subject: RE: [EXTERNAL] Re: [PATCH] x86/ioapic: Don't return 0 as valid virq

On Tue, Mar 14 2023 at 10:23, Saurabh Singh Sengar wrote:
>> This should be added to your commit message: what guest VM is that and
>> why should the kernel support it.
>
> Guest VM is a linux VM running as child partition on Hyper-V. Hyper-v Linux
> documentation is in Documentation/virt/hyperv/.
>
> In commit I wanted to mention that any system which is not registering
> IO-APIC will have this issue. But I am fine to mention specifically
> about the issue I am facing.  As part of your next comment, I have
> explained the issue in detail if that is good, I can put that as
> commit message.
>> 
>> Why doesn't it need an IO-APIC and why does the current code need to be
>> changed just for your guest VM?
>
> For Hyper-V Virtual Machines, few platforms don't have any devices to be
> hooked to IO-APIC. Although it has Hyper-V based MSI over VMBus which
> assigns interrupts to PCIe devices. In such platforms IO-APIC is not
> registered which causes gsi_top value to remain at 0 and not get properly
> assigned. Moreover, due to the inability to disable CONFIG_X86_IO_APIC
> flag, the io-apic code still gets compiled. Thus, arch_dynirq_lower_bound
> function in io_apic.c decides the lower bound of irq numbers based on gsi_top.
>
> Later when PCIe-MSI attempts to allocate interrupts, it gets 0 as the first
> virq number because gsi_top is still 0. 0 being invalid virq is ignored by
> MSI irq domain and results allocation of the same PCIe MSI twice.
>
> 		CPU0		CPU1
> 0:		2			0		Hyper-V PCIe MSI 1073741824-edge
> 1:		69			0		Hyper-V PCIe MSI 1073741824-edge      nvme0q0
>
> To avoid this issue, if IO-APIC and gsi_top are not initialized, return the
> hint value passed as 'from' value to arch_dynirq_lower_bound instead of 0.
> This will also be identical to the behaviour of weak arch_dynirq_lower_bound
> function defined in kernel/softirq.c.

I find this mightly confusing. Something like this perhaps:

  Subject: x86/ioapic: Don't return 0 from arch_dynirq_lower_bound()

  arch_dynirq_lower_bound() is invoked by the core interrupt code to
  retrieve the lowest possible Linux interrupt number for dynamically
  allocated interrupts like MSI.

  The x86 implementation uses this to exclude the IO/APIC GSI space.
  This works correctly as long as there is an IO/APIC registered, but
  returns 0 if not. This has been observed in VMs where the BIOS does
  not advertise an IO/APIC.  

  0 is an invalid interrupt number except for the legacy timer interrupt
  on x86. The return value is unchecked in the core code, so it ends up
  to allocate interrupt number 0 which is subsequently considered to be
  invalid by the caller, e.g. the MSI allocation code.

  The function has already a check for 0 in the case that an IO/APIC is
  registered, but ioapic_dynirq_base is 0 in case of device tree setups.

  Consolidate this and zero check for both ioapic_dynirq_base and gsi_top,
  which is used in the case that no IO/APIC is registered.

And then make the code to look like the below, which makes it very
clear what this is about.

Thanks,

        tglx
---
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2477,17 +2477,22 @@ static int io_apic_get_redir_entries(int
 
 unsigned int arch_dynirq_lower_bound(unsigned int from)
 {
+	unsigned int ret;
+
 	/*
 	 * dmar_alloc_hwirq() may be called before setup_IO_APIC(), so use
 	 * gsi_top if ioapic_dynirq_base hasn't been initialized yet.
 	 */
-	if (!ioapic_initialized)
-		return gsi_top;
+	ret = ioapic_dynirq_base ? : gsi_top;
+
 	/*
-	 * For DT enabled machines ioapic_dynirq_base is irrelevant and not
-	 * updated. So simply return @from if ioapic_dynirq_base == 0.
+	 * For DT enabled machines ioapic_dynirq_base is irrelevant and
+	 * always 0. gsi_top can be 0 if there is no IO/APIC registered.
+	 *
+	 * 0 is an invalid interrupt number for dynamic allocations. Return
+	 * @from instead.
 	 */
-	return ioapic_dynirq_base ? : from;
+	return ret ? : from;
 }
 
 #ifdef CONFIG_X86_32


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ