linux-kernel - Re: [PATCH] genirq/msi: Shutdown managed interrupts with unsatifiable affinities

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <32633c5f-abbb-0946-3519-e0f9759971f7@huawei.com>
Date:   Thu, 10 Mar 2022 14:11:59 +0800
From:   Xiongfeng Wang <wangxiongfeng2@...wei.com>
To:     John Garry <john.garry@...wei.com>, Marc Zyngier <maz@...nel.org>,
        <linux-kernel@...r.kernel.org>
CC:     Thomas Gleixner <tglx@...utronix.de>,
        David Decotigny <ddecotig@...gle.com>
Subject: Re: [PATCH] genirq/msi: Shutdown managed interrupts with unsatifiable
 affinities



On 2022/3/10 11:24, Xiongfeng Wang wrote:
> 
> 
> On 2022/3/9 18:20, John Garry wrote:
>> +
>>
>> On 07/03/2022 19:06, Marc Zyngier wrote:
>>> When booting with maxcpus=<small number>, interrupt controllers
>>> such as the GICv3 ITS may not be able to satisfy the affinity of
>>> some managed interrupts, as some of the HW resources are simply
>>> not available.
>>>
>>> In order to deal with this, do not try to activate such interrupt
>>> if there is no online CPU capable of handling it. Instead, place
>>> it in shutdown state. Once a capable CPU shows up, it will be
>>> activated.
>>>
>>> Reported-by: John Garry <john.garry@...wei.com>
>>> Reported-by: David Decotigny <ddecotig@...gle.com>
>>> Signed-off-by: Marc Zyngier <maz@...nel.org>
>>
>> Tested-by: John Garry <john.garry@...wei.com>
>>
>>> ---
>>
>> JFYI, I could not recreate the same crash reported in the original thread for
>> "nohz_full=5-127 isolcpus=nohz,domain,managed_irq,5-127 maxcpus=1". Here's just
>> showing what I set via cmdline:
> 
> I think it's the userspace online all the CPUs that cause the crash. Could you
> please try to online all the CPUs after boot.

Sorry, please ignore what I said above. It's wrong.

This patch has no issues. When I test the managed irq, I apply this patch and
the following modification. It is the following modification and the kernel
parameters that cause the crash. This patch has no problems. Sorry for the
unclear discription before.

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index eb0882d15366..0cea46bdaf99 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -1620,7 +1620,7 @@ static int its_select_cpu(struct irq_data *d,

 		cpu = cpumask_pick_least_loaded(d, tmpmask);
 	} else {
-		cpumask_and(tmpmask, irq_data_get_affinity_mask(d), cpu_online_mask);
+		cpumask_copy(tmpmask, aff_mask);

 		/* If we cannot cross sockets, limit the search to that node */
 		if ((its_dev->its->flags & ITS_FLAGS_WORKAROUND_CAVIUM_23144) &&

Thanks,
Xiongfeng

> 
> Thanks,
> Xiongfeng
> 
>>
>> estuary:/$ dmesg | grep -i hz
>> [    0.000000] Kernel command line: BOOT_IMAGE=/john/Image rdinit=/init
>> console=ttyS0,115200 no_console_suspend nvme.use_threaded_interrupts=0
>> iommu.strict=0 acpi=force earlycon=pl011,mmio32,0x602b0000 nohz_full=5-127
>> isolcpus=nohz,domain,managed_irq,5-127 maxcpus=1
>> [    0.000000] NO_HZ: Full dynticks CPUs: 5-127.
>> [    0.000000] arch_timer: cp15 timer(s) running at 100.00MHz (phys).
>> [    0.000000] sched_clock: 57 bits at 100MHz, resolution 10ns, wraps every
>> 4398046511100ns
>> [   15.314258] sbsa-gwdt sbsa-gwdt.0: Initialized with 10s timeout @ 100000000
>> Hz, action=0
>>
>> And for the kernel build:
>> $ more .config | grep NO_HZ
>> CONFIG_NO_HZ_COMMON=y
>> # CONFIG_NO_HZ_IDLE is not set
>> CONFIG_NO_HZ_FULL=y
>> # CONFIG_NO_HZ is not set
>> $
>>
>> Thanks,
>> John
>>>   kernel/irq/msi.c | 12 ++++++++++++
>>>   1 file changed, 12 insertions(+)
>>>
>>> diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
>>> index 2bdfce5edafd..aa84ce84c2ec 100644
>>> --- a/kernel/irq/msi.c
>>> +++ b/kernel/irq/msi.c
>>> @@ -818,6 +818,18 @@ static int msi_init_virq(struct irq_domain *domain, int
>>> virq, unsigned int vflag
>>>           irqd_clr_can_reserve(irqd);
>>>           if (vflags & VIRQ_NOMASK_QUIRK)
>>>               irqd_set_msi_nomask_quirk(irqd);
>>> +
>>> +        /*
>>> +         * If the interrupt is managed but no CPU is available
>>> +         * to service it, shut it down until better times.
>>> +         */
>>> +        if ((vflags & VIRQ_ACTIVATE) &&
>>> +            irqd_affinity_is_managed(irqd) &&
>>> +            !cpumask_intersects(irq_data_get_affinity_mask(irqd),
>>> +                    cpu_online_mask)) {
>>> +                irqd_set_managed_shutdown(irqd);
>>> +                return 0;
>>> +            }
>>>       }
>>>         if (!(vflags & VIRQ_ACTIVATE))
>>
>> .
> .
>