lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20230324162402.GA14597@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net>
Date:   Fri, 24 Mar 2023 09:24:02 -0700
From:   Saurabh Singh Sengar <ssengar@...ux.microsoft.com>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     Saurabh Singh Sengar <ssengar@...rosoft.com>,
        Borislav Petkov <bp@...en8.de>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
        "x86@...nel.org" <x86@...nel.org>, "hpa@...or.com" <hpa@...or.com>,
        "johan+linaro@...nel.org" <johan+linaro@...nel.org>,
        "isaku.yamahata@...el.com" <isaku.yamahata@...el.com>,
        "Michael Kelley (LINUX)" <mikelley@...rosoft.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "rahul.tanwar@...ux.intel.com" <rahul.tanwar@...ux.intel.com>,
        "andriy.shevchenko@...el.com" <andriy.shevchenko@...el.com>
Subject: Re: [EXTERNAL] Re: [PATCH] x86/ioapic: Don't return 0 as valid virq

On Fri, Mar 24, 2023 at 04:39:07PM +0100, Thomas Gleixner wrote:
> On Tue, Mar 14 2023 at 10:23, Saurabh Singh Sengar wrote:
> >> This should be added to your commit message: what guest VM is that and
> >> why should the kernel support it.
> >
> > Guest VM is a linux VM running as child partition on Hyper-V. Hyper-v Linux
> > documentation is in Documentation/virt/hyperv/.
> >
> > In commit I wanted to mention that any system which is not registering
> > IO-APIC will have this issue. But I am fine to mention specifically
> > about the issue I am facing.  As part of your next comment, I have
> > explained the issue in detail if that is good, I can put that as
> > commit message.
> >> 
> >> Why doesn't it need an IO-APIC and why does the current code need to be
> >> changed just for your guest VM?
> >
> > For Hyper-V Virtual Machines, few platforms don't have any devices to be
> > hooked to IO-APIC. Although it has Hyper-V based MSI over VMBus which
> > assigns interrupts to PCIe devices. In such platforms IO-APIC is not
> > registered which causes gsi_top value to remain at 0 and not get properly
> > assigned. Moreover, due to the inability to disable CONFIG_X86_IO_APIC
> > flag, the io-apic code still gets compiled. Thus, arch_dynirq_lower_bound
> > function in io_apic.c decides the lower bound of irq numbers based on gsi_top.
> >
> > Later when PCIe-MSI attempts to allocate interrupts, it gets 0 as the first
> > virq number because gsi_top is still 0. 0 being invalid virq is ignored by
> > MSI irq domain and results allocation of the same PCIe MSI twice.
> >
> > 		CPU0		CPU1
> > 0:		2			0		Hyper-V PCIe MSI 1073741824-edge
> > 1:		69			0		Hyper-V PCIe MSI 1073741824-edge      nvme0q0
> >
> > To avoid this issue, if IO-APIC and gsi_top are not initialized, return the
> > hint value passed as 'from' value to arch_dynirq_lower_bound instead of 0.
> > This will also be identical to the behaviour of weak arch_dynirq_lower_bound
> > function defined in kernel/softirq.c.
> 
> I find this mightly confusing. Something like this perhaps:
> 
>   Subject: x86/ioapic: Don't return 0 from arch_dynirq_lower_bound()
> 
>   arch_dynirq_lower_bound() is invoked by the core interrupt code to
>   retrieve the lowest possible Linux interrupt number for dynamically
>   allocated interrupts like MSI.
> 
>   The x86 implementation uses this to exclude the IO/APIC GSI space.
>   This works correctly as long as there is an IO/APIC registered, but
>   returns 0 if not. This has been observed in VMs where the BIOS does
>   not advertise an IO/APIC.  
> 
>   0 is an invalid interrupt number except for the legacy timer interrupt
>   on x86. The return value is unchecked in the core code, so it ends up
>   to allocate interrupt number 0 which is subsequently considered to be
>   invalid by the caller, e.g. the MSI allocation code.
> 
>   The function has already a check for 0 in the case that an IO/APIC is
>   registered, but ioapic_dynirq_base is 0 in case of device tree setups.
> 
>   Consolidate this and zero check for both ioapic_dynirq_base and gsi_top,
>   which is used in the case that no IO/APIC is registered.
> 
> And then make the code to look like the below, which makes it very
> clear what this is about.
> 
> Thanks,
> 
>         tglx
> ---
> --- a/arch/x86/kernel/apic/io_apic.c
> +++ b/arch/x86/kernel/apic/io_apic.c
> @@ -2477,17 +2477,22 @@ static int io_apic_get_redir_entries(int
>  
>  unsigned int arch_dynirq_lower_bound(unsigned int from)
>  {
> +	unsigned int ret;
> +
>  	/*
>  	 * dmar_alloc_hwirq() may be called before setup_IO_APIC(), so use
>  	 * gsi_top if ioapic_dynirq_base hasn't been initialized yet.
>  	 */
> -	if (!ioapic_initialized)
> -		return gsi_top;
> +	ret = ioapic_dynirq_base ? : gsi_top;
> +
>  	/*
> -	 * For DT enabled machines ioapic_dynirq_base is irrelevant and not
> -	 * updated. So simply return @from if ioapic_dynirq_base == 0.
> +	 * For DT enabled machines ioapic_dynirq_base is irrelevant and
> +	 * always 0. gsi_top can be 0 if there is no IO/APIC registered.
> +	 *
> +	 * 0 is an invalid interrupt number for dynamic allocations. Return
> +	 * @from instead.
>  	 */
> -	return ioapic_dynirq_base ? : from;
> +	return ret ? : from;
>  }
>  
>  #ifdef CONFIG_X86_32
>

Thanks you for your valuable suggestions. Commit message and code looks
much better now. I will send the V2 with your "Co-Developed-by" tag, I
hope its fine with you.

Regards,
Saurabh
 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ